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POLYCYSTIC KIDNEY DISEASE 1 GENE AND USES THEREOF 
BACKGROUND TO THE INVENTION 

In humans, one of „ the commonest of all genetic. disorders is 
autosomal- dominant polycystic kidney : disease (ADPKD) also 
5 termed adult polycystic kidney disease . (APKD)., affecting 
• approximately 1/1000 individuals (Dalgaard, 1957). ADPKD is 
a progressive disease of- cyst- „ formation and enlargement 
- typically leading to end stage renal disease (ESRD) in late 
middle age. . The major cause of- morbidity in. ADPKD is 

10 ■ progressive, renal disease characterized by the formation and 
enlargement of fluid filled cysts, resulting in grossly 
enlarged kidneys. Renal function deteriorates as normal 
tissue is compromised* by . cystic growth, resulting in end 
stage renal disease (ESRD) in more than 50% of patients by 

15 the age of 60 years (Gabow/ et al., 1992). ADPKD accounts 
for 8-10% of all. renal transplantation and dialysis ^patients 
in Europe and the USA (Gabow, 1993). 

ADPKD also causes cystic growth in othier organs 
(reviewed in Gabow, 1990) and occasionally presents . in 

20 childhood (Fink, et al., 1993; Zerres, et al., 1993). 
Extrarenal manifestations include liver cysts ( Milutinovic, 
et al.*, 1980), and more rarely cysts of the pancreas (Gabow, 
1993) and other organs. Intracranial aneurysms -occur in 
approximately 5% of patients and are a significant cause of 

25 -morbidity and .mortality due to subarachnoid haemorrhage 
(Chapman, et al., 1992). ADPKD is associated with a higher 
prevalence of various connective tissue disorders. ■* An 
increased prevalence of heart valve defects (Hossack, et 
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al., 1988), % hernia (Gabow, 1990) and colonic diverticulae 
(Scheff, et al., 1980) have been reported-. 

Considerable progress has been made in the last few 
years -in understanding the pathophysiology of- ADPKD (and 
5 other animal- models * of cyst i'c- disease") . Cysts in ADPKD - are 
known to develop from * outpouchings of descending. " or 
'* ascending kidney tubules "and the early stages are 
characterized by a thickening and disorganization - of -the 
basement membrane; accompanied by: a de-differentiation of 
10 tubular epithelial' cells. - Several of the- 'Characteristics of 
ADPKD epithelia: altered growth responses; abnormal 
expression of various proteins and reversal of polarity, may 
be a sign of. this de-differentiation and important in cyst 
expansion. .The nature of *the primary defect which triggers 
15 - these changes 'is, hpwever; unknown and consequently much 
effort has been devoted to identifying the ^causative agent 
by gepetic means, . • - 

The first step towards, positional cloning of an ADPKD 
gene was the demonstration of linkage of one locus now 
20 designated the polycystic kidney disease 1 (PKD1) locus, to 
the a. globin cluster on the., short arm of chromosome 16 
... (Reeders,.et al . , ■ 1985.)-. : Subsequently, families with ADPKD 
"~ unlinked to. markers of 16p -were described. ( Kimberlirig'y .et 
al., 1988; Romeo, et al., 1988) and a second-. ADPKD --locus 
25 (PKD2) has recently been assigned to chromosome region 4ql3- 
• q23 (Kimberling, et.al., 1993; Peter,- et al., 1993). It is 
estimated c that approximately . 85% of ADPKD is due to rPKDl 
(Peters and Sankuijl, .1992) with PI?D2 accounting for most of 
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the remainder. PKD2 appears, to be milder condition with a 
later.age of . onset and ESRD (Parfrey. et al., 1990; Gabow, 
et.al., 1992; Ravine, et. al. f 1992). 

The position of the P,KD1 , locus was refined to 
5 chromosome band 16pl3.3 and many markers were isolated from, 
that region (Breuning, et al.,, 1987; Reeders, et al., 1988; 
Breuning, et al., 1990.; . Germing, et al., 1990; Hyland, et 
al., 1990; Himmelbauer, et al., 1991). Thexr order, and the 
.. position of the PKD1 locus, has been determined by, extensive 
10 linkage analysis in normal and PKD1 families and by the use 
of a panel of somatic cell, hybrids (Reeders et al., 1988; 
- .Breuning, . et al., 1990; Germino, . et ■ al v , 1990). ADPKD is 
genetically heterogenous with loci, .mapped not only to 
v 16pl3 .3 ( PKD1 ) , but also to chromosome 4 ( PKD2 ) . Although 
15 the phenotype of PKD1 and PKD2. are .clearly, similar, it is- 
now well documented that PKD1. (which accounts for about 85% 
; of ADPKD; (Peters, ,1992 \ is a more severe disease with an 
average age at ESRD of about 56 years compared to about 71.5 
years for PKD2 (Ravine, 1992). . An accurate long . range 
20 restriction map of the 16pl3.3 region (Harris, et al. , 1990; 
Germino, et al., 1992) has located the PKD1 locus in an 
interval of approximately 600 kb between the markers GGG1 
and SM7 (Harris, et al . , 1991 ; , Somlo, et al., 1992) (see 
Figure la). , -The density of CpG islands and identification 
25 • of many mRNA transcripts indicated that this area is rich in 
gene sequences. jGermino et al. U992) estimated, that the 
candidate. region contains approximately 20 genes* 

identification of the PKD1 gene from within. this area 
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has thus proved difficult and other means to pinpoint the 
disease gene have been' sought. Linkage disequilibrium -has 
been demonstrated between PKD1" and the' proximal'- marker VK5, 
in a Scottish population (Pound, et al., *1992) and between 
5 PKD1 and*BLu24 ("see figure la), in a Spanish population 
(Perai; et al?; 1994). * Studies with additional markers have 
shown evidence of a common ancestor in a proportion of each 
population (Peral, et"ai:\ 1994; Snarey, et al., 1994), but 
' the association has not precisely positioned the PKD1 locus. 

10 ' Disease associated genomic rearrangements, detected by^ - 

cytogenetics or pulsed field gel electrophoresis '( PFGE) have ^ 
been " instrumental in * the identification of various genes 
associated with various genetic 'disorders'. Hitherto/ no*** 
such abnormalities related to PKD1 have been described. - 

15"* .This situation contrasts 1 ' - with that -for — the tuberous 
' sclerosis locust which "lies within 16pl3.3 ( TSC2 ) . ' In that 
case, 'TSC associated deletions were' detected by PFGE within 
: the interval thought to- contain the PKD1 -gene and their 
characterisation was a significant step toward- the rapid 

20 identification- -of the TSC2 gene (European Chromosome 16 
Tuberous Sclerosis Consortium, 1993). The TSC2 gene 
therefore maps within the candidate region for the hitherto 
unidentified PKD1 gene; as' polycystic kidneys- are a feature 
common to TSC ' and ADPKD1 ; ( Bernstein and Robbins, 1991) the 

25 possibility of * an etiological link, as proposed by Kandt et 
al-- (1992), "was considered. A contiguous gene syndrome 
resulting from the disruption of PKD1 and the 'adjacent 
tuberous sclerosis 2 ( TSC2 ) gene, which- is associated with 
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TSC and severe childhood onset polycystic ; kidney disease, 
has also been defined (Brook-Carter et al, 1994).., 

We have now identified , ; a pedigree in which the two 
distinct phenotypes, typical ADPKD or TSC, are seen in 
different members. In this family, the two individuals with 
ADPKD are carriers of a balanced chromosome translocation 
with a breakpoint within 16pl3.3. We have located the 
chromosome 16 translocation breakpoint and a gene disrupted 
by this rearrangement has been defined; the discovery of 
additional mutations of that gene in other PKD1 patients 
shows that we have identified the PKD1 gene. Full 
characterisation of the PKD1 transcript has been 
significantly complicated because of the unusual genomic 
region containing most of the gene. All but 3.5 kb at the 
3' end of the transcript (which is about 14 kb in total) is 
encoded by a region which is reiterated several times 
elsewhere on the same chromosome (in 16pl3.1 and termed the 
HG area). The structure of the duplication is complex, with 
some regions copied more times than others, and the HG 
region encoding three large transcripts. The transcripts, 
from the HG area are: HG-A (21 kb), HG-B (17 kb) and HG-C 
(8.5 kb) and although these have 3 ! ends which differ from 
PKD1 , over most of their length they share substantial 
homology to the PKD1 transcript. Consequently, cloning and 
characterizing a bona fide PKD1 cDNA has proven difficult. 
To overcome the problem caused by duplication we have cloned 
cDNAs covering the entire transcript from a cell line which 
contains the PKD1 but not the HG loci. Characterisation of 
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"these cDNAs has- enabled the PKD1 protein sequence to be 
predicted and led to' "the "identification' of - 'several 
homologies with described motifs. 
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SUMMARY OF THE INVENTION 

- - Accordingly, in one aspect, this invention provides an 
; isolated, purified or recombinant, nucledc acid sequence 
comprising: 

5 . - -(a)- a PKD1 -encoding nucleic acid or its complementary 
strand , L - • • - / . . 

(-b) a sequence substantially homologous to, or capable 
of hybridizing to, a substantial portion of. a molecule 
defined in (a) above, or 
10 (c) a fragment of a molecule defined in (a) or (b) 

above. ■ ^ ■ - - - " : ' 

In particular, there is ;provided a sequeince wherein the 
PKD1 gene has the nucleic acid sequence according to Fig. 
15, or the partial sequence of Figs. 7 or 10- The invention 
15 therefore includes, a • DNA -molecule -coding -'"for -a' polypeptide 
having the amino acid sequence of' "'Figure -15, or • a 
polypeptide fragment thereof; "and genomic DNA corresponding 
to a molecule as in-(a) - :( c ) * above.- - 

As used herein, "substantially homologous" refers to a 
20 nucleic acid strand that is sufficiently duplicative of the 
PKD1 sequence presented in Fig. 15 such that it is capable 
of hybridizing, to that sequence under moderately stringent, 
and preferably stringent -conditions, as defined herein 
below. Preferably, . "substantially homo'logous" : refers to a 
25 homology of between 97 and 100%. ' Further, such a strand 
will encode or be- complementary to -*a strand that encodes 
PKD1 protein having the biological activity described below. 
As used-herein, a "substantial portion of a molecule" refers 
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to at least 60%, preferably 80% ahd most preferably 90% of 
the molecule in • terms of its 7 linear residue length or its 
molecular weight." • "Nucleic acid" refers" to both' DNA. and 
RNA . 

5 The PKD1* gene described -herein is a"" gene found on human 

chromosome 16, and the results of studies described 1 * herein 
form the basis for concluding that'' this PKD1 gene encodes a 
■■• protein called PKD1 protein- which has a role- -in the 
prevention or suppression of ADPKD. , The PKD1 gene therefore 
10 includes the DNA sequences shown in Figure 15 , " and all 
functional equivalents. By "functional equivalents" we 
i mean nucleic acid :- sequences . that are substantially 
homologous .to. the .PKD1 nucleic acid- sequence, as presented 
in Fig.. 15; and encoding. 'a '^protein' that possesses one or 
15 , more of , the biological <^ functions or activities of PKD1; 
, i. : e., that is^. involved in cell/cell adhesion, cell/ceil 
recognition or. cell/cell communication;- for '"example to 
effect adhesion of cells to other cells or components of the 
extracellular - - matrix;- effect communication and/or 
20 interaction between epithelial cells and the basal membrane 
• (whether in kidneys or otherwise); .assist in development of 
connective tissue such as assembly and/or maintenance' of the 
basal membrane; , in signal transduction ■ between . cells or 
cells- and components of the extracellular- matrix; and/or to 
25 promote binding of cells carrying proteins -such as integrins 
or carbohydrates to target cells. The biological function 
of PKD1 - of course ■ includes . maintaining -a healthy 
physiological state; * that is, - the native protein's 
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aberrations or absence . results in ADPKD or an associated 
disorder. 

The PKD1 gene may - ; furthermore include regulatory 
regions which control the expression of * the PKD1 coding 
5 sequence, including promoter,- enhancer «- and terminator 
regions. Other DNA sequences such as introns spliced from 
the end-product- PKD1 RNA transcript are also encompassed. 
. Although work has been carried, out in relation to the human 
gene, the corresponding genetic /and functional sequences 
10 present in lower animals are also encompassed. 

The present invention therefore further provides a PKD1 
gene or its complementary strand . having the sequence 
according to Figure . 15 which gene or strand is mutated in 
some ADPKD patients (more .specifically, :PKD1 patients). 
15 Therefore, the invention further, provides a nucleic acid 
sequence comprising a. mutant. PKD1* gene as described herein, 
including wherein Intron 43 as defined hereinbelow has a 
deletion of 18 or 20bp resulting in an intron' of 55 or 57bp. 
As used herein, "PKD1 mutant " or "mutation" encompasses 
20 alterations of the native PKD1 nucleotide or amino acid 
sequence, as defined by Fig. 15, i.e., substitutions, 
deletions or additions, and also encompasses deletion of DNA 
containing the entire PKD1 gene. 

The invention further provides a nucleic acid sequence 
25 comprising a mutant PKD1 gene/ especially one selected from 
a sequence comprising a partial sequence according to 
Figures 7 and /.or ■ 10, or the corresponding sequences 
disclosed in Fig. 15, when: ' • » 
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- : (a) [0X114] base pairs 1746-2192 as defined in Figure 
7 'are deleted (446bp); 

(b) • [0X32] base-pairs 3696-3831 as defined in Figure 
7 are -deleted by a' splicing 'defect ; 
5 (c) [0X875] about 5.5kb flanked by the two Xbal sites 

shbwn : in ; Figure 3a are deleted and the EcoRl site separating 
the CW10 (41kb) and JH1 (18kb) sites is thereby absent 

(d) [WS53] about **100kb extending between the JH1 and 

• CW21 and the SM6 and JH17 sites shown in - Figure 6 and the 
10 PKD1 gene is thereby • absent , the deletion lying* proximally 

between SM6 and JH13; - . 

( e ) .:... [461-J 18bp ;_are deleted in * the ' 75bp intron 
,. amplified by the primer: pair 3A3C insert at position 3696 of 

the ,3' sequence, as showrr in. Figure* 11;* 
15 ..(f ) [QX1054] . 20bp. -.*are deleted* in the 75bp intron 

, amplified by -the ^primer pair 3A3C insert at position 3696 of 
the 3 1 .sequence as shown in Figure 11; .*" '* - r 

(g) [WS212] about 75kb are deleted between SM9-CW9 
distally - and the- PKD1 3 ' UTR proximally as shown iri Figure 
20 12; 

. (h) [WS-215] about 160kb are deleted between CW20 and 

• SM6-JH17 as shown in. Figure 12; . . ■ ' 

(i) [WS-227] about 50kb . are deleted between CW20 and 
JH11 as- shown in Figure 12; • * *- - * 
25 (j) ..[WS-219] about :-27kb - are deleted < between JH1 and 

JH6 as shown in - Figure 12; % . 

(k) [WS-250] about 160kb are deleted between CW20 and 
Blu24 as shown in Figure 12;' * 
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(1) , [WS-194] about 65kb is deleted between CW20 and 

cwio-. 

The invention therefore extends • -to RNA molecules 
comprising an RNA sequence corresponding to any of -the DNA 
5 sequences set out above.. Such molecule may be the 
transcript, reference PBP and ; identifiable with respect to 
the restriction map of Figure 3a and having, a length of 
about 14 KB. - , «- 

In another aspect, the. invention provides a nucleic 
10 acid probe having a sequence as set out above; in 
particular, this invention exrends to a* purified nucleic 
acid probe which hybridizes to at .least a portion of the DNA 
or RNA molecule of any of . the preceding sequences. 
Preferably, the probe includes a .label such as a radiolable, 
15 for example, a 3 ?P label. 

- ~ In another aspect, this invention provides a purified 
DNA or RNA coding fo.r a protein comprising; the' amino acid 
sequence of Figure 15, or a 'protein polypeptide having 
homologous properties with said protein, or having at least 
20 one functional domain or active site in common with said 
protein. 

The DNA molecule defined above may be incorporated in 
a recombinant cloning vector -.for expressing a protein having 
the * amino acid sequence -of Figure 15, or a protein or a 
25 -polypeptide having at- least one functional domain or active 
site in common with said protein. Such a vector may include 
any vector for expression In bacteria, e.g., E. coli; yeast, 
insect, or mammalian cells. 
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The invention also features a- nucleic acid probe for 
detecting PKD1 nucleic acid comprising 10 consecutive 
nucleotides as -presented in "Fig. 15. Preferably, the probe 
•may comprise 15, 20, 50, 100,' 200, or "300, etc., consecutive 
5 nucleotides (n't) presented in Fig? 13,' and* may fall within 
the size range 15nt-13kb : , lOOnt-Skb,^ I50nt-4kb, ' 300nt-2kb, 
and 500nt-lkb. - ■ 

Probes are used according to the invention : in 
hybridization reactions- to identify PKDl sequences, whether 
10 they be native or mutated*. PKDl DNA or RNA, as disclosed 
herein; Such probes are usef ul • f or identifying the PKDl 
gene or a- mutation thereof-, as defined* herein. 

The invention < also features a synthetic polypeptide 
corresponding in amino, .acid residue sequence to at least a 
15 portion of the sequence of naturally occurring ;: PKD1 , and 
having a molecular, weight equal to less than that of the 
. native protein.- A synthetic polypeptide of the inventidn : 'is 
useful for : inducing .the production of antibodies "specific 
for the synthetic, polypeptide and that bind to naturally 
20 - occurring PKDl. . 

Preferred embodiments of this aspect of the invention 
include a^ group of synthetic polypeptides* whose members 
correspond to a fragment of. the PKDl " protein comprising a 
stretch of amino acids of .at/least 8, and preferably 15, 30, 
25 .50, or 100 residues in length from the 'sequence disclosed -in 
Fig. 15. _ * . 

In another aspect, the invention provides a polypeptide 
encoded by a sequence as set out above; or having the -amino 
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acid sequence according to the amino acid sequence of Figure 
1.5, or a protein or polypeptide having homologous properties 
with said protein, or having at least one functional domain 
or, active site in common with said protein. In particular, 
5 there is provided an .isolated, purified or recombinant 
polypeptide comprising a PKD1 protein or a mutant or variant 
thereof or encoded by a . sequence set out above or a variant 
. thereof having substantially the same activity -as the PKD1 
protein. The present invention may further comprise a 
10 polypeptide having 9 or 13 transmembrane pairs instead of 11 
transmembrane domains as described hereinbelow. . Further 
comprising this invention is a. molecule which interacts with 
a polypeptide as herein described .which molecule synergises, 
. causes, enhances or is necessary^ for the functioning of the 
15 PKD1 protein as herein described. . 

The invention also encompasses recombinant expression 
vectors comprising a nucleic acid , or isolated DNA encoding 
PKD1 and a process for .preparing PKD1 polypeptide, 
comprising culturing a suitable host cell comprising the 
20 vector under conditions suitable for promoting expression of 
PKD1, and recovering said PKD1. 

This invention also provides an in vitro method of 
determining whether an individual is at risk of a PKD1- 
associated disorder, comprising assaying a biological sample 
25 from the individual to determine the presence and/or amount 
of PKD1 protein or polypeptide having the amino acid 
sequence of Figure 15. .? - • ■ 

As used. herein, "biological sample" includes any fluid 
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or tissue sample from a mammal, preferably a human, 
including but not limited to blood, urine, saliva, any body 
organ tissue," cells from any body tissue, including 1 blood 
cells i . 
5- Additionally or alternatively, a sample may be assayed 

to determine the presence ' : and/or 'amount of mRNA coding for 
the protein or polypeptide having the amino acid sequence of 
• Figure 15,' or. to determine' the* fragment lengths' of fragments 
of nucleotide sequences coding " for the protein ' or 
10* polypeptide of. Figure 15, or to detect inactivating 
mutations • in DNA "coding for "a protein having the amino 1 acid 
sequence of Figure 15 • or a protein having -homologous 
properties. The screening^ preferably includes applying a 
nucleic acid amplification process, as described herein in 
15 detail, to said sample to amplify a fragment of the DNA 
sequence. -The' -nucleic -acid amplification process 
advantageously 'utilizes' at le^ast one of the following sets 
of primers as identified herein: AH3 F9 : AH3 B7> 
■ 3A3 CI : 3A3 C2'; and AH4 -F2 : JH14B3. . ' 
20 Alternatively,* the screening method may : comprise 

digesting the sample DNA to provide EcoRI fragments and 
hybridizing with a DNA probe which hybridizes to : the EcoRI 
fragment identified (A) iii Figure 3(a), and the DNA probe 
may comprise the- DNA probe CW10 identified herein. - 
25 Another, screening method may comprise digesting the 

sample to provide BamHI fragments. --and hybridizing with a DNA 
probe which hybridizes to the BamHI fragment identified (B) 
in Figure 3(a), and the DNA probe may comprise the DNA probe 
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1A1H.6 identified herein. ... 

A., method according to. the present .invention may 
comprise, detecting a PKD1 -associated disorder in a patient 
. suspected of having or. having predisposition to the disorder 
5 (i.e., a carrier), the method comprising detecting the 
. presence of and/or .evaluating, the. -characteristics of PKD1 
DNA, PKD1 mRNA and. or PKD1 protein in a sample taken from 
the patient. Such method may .comprise detecting and/or 
evaluating whether the PKD1 DNA is deleted, missing, 
10 mutated,, aberrant or not expressing normal PKD1 protein. 
One way of carrying out such a method comprises:. A. taking 
a biological,, tissue or biopsy sample from the patient; B. 
detecting the presence of i;-. and /or evaluating the 
t characteristics of PKD1 DNA, PKD1 mRNA and/or PKD1 protein 
15 - in the sample- to obtain a first, -set, -of results; C. comparing 
the first set of results with a second set of results 
obtained using the same or. similar methodology, for an 
- individual that is not suspected of having the disorder; and 
if the first and second . sets of results differ in that the 
20 PKD1 DNA is deleted, missing, aberrant, . mutated or not 
expressing PKD1 protein . then that ;is . indicative of the 
presence, predisposition or tendency of the patient to 
develop the disorder. As used herein, a "PKDl-associated 
disorder" refers to -adult polycystic kidney disease,- as 
25 described herein, and also refers to .tuberous sclerosis, as 
well as pther -disorders having symptoms such as cyst 
formation in common with these diseases. 

- A specific method according to 'the invention comprises 
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extracting from a patient a sample of'PKDl DNA or DNA from 
the PKD1 locus ; purporting to be PKD1 DNA, cultivating the 
sampie in vitro ' 'and analyzing " the resulting protein j' and 
-■ comparing the resulting protein with normal PKD1 'protein 
5 ' according to the well-established Protein Truncation Test. 
Less sensitive tests include analysis of ' RNA using RT PCR 
(reverse transcriptase - "polymerase chain reaction), and 
examination of genomic DNA- " - - ..... 

Step. G of the above method' may be replaced by: 
10 comparing the -"first set" of results with a second set of 
results obtained using the same" or similar* methodology in an 
. individual that is known -to have the or at least one of the 
disorder(s); and if the first and second sets of results are 
'substantially identical, -this indicates that the PKD1 DNA : in 
15 the patient Is '-deleted,-* mutated or not expressing normal 
PKD1 proteins r z - - 

—The invention further provides a method of 
characterizing a mutation in. a subject" suspected of having 
a mutation in the PKD1 gene,- which "method comprises: A 
20 amplifying each of the exohs in the PKD1 gene of the 
subject; B. denaturing the complementary strands -of 1 the 
amplified exons; C. diluting the - denatured separate, 
complementary strands -to allow each single-stranded DNA 
molecule to assume a secondary structural confirmation;- D. 
25 subjecting, the., DNA* molecule -to electrophoresis under—non- 
denaturing conditions; E. comparing the electrophoresis 
pattern of the single-stranded molecule with the 
electrophoresis- .pattern of . a * single-stranded molecule 



SUBSTITUTE SHEET (RULE 26) 




WO 95/34649 PCT/GB95/01386 

17 

containing the same amplified exon from a control individual 
which has either a normal or PKD1 heterozygous genotype; 
and> F. sequencing any amplification product which has an 
electrophoretic pattern different from the pattern obtained 
5 from the DNA of the control individual,. 

The invention, also extends .to a diagnostic . kit for 
carrying out a method as, set out above,, comprising nucleic 
acid primers, for amplifying a fragment of the DNA or RNA 
. sequences defined above, and packaging means therefore. The 
10 ( kit may optionally include written instructions stating that 
-the primers are to be used, for detection of ; disorders 
. associated .with the PKD1 gene;. The nucleic acid primers may 
comprise at least one of the following sets: AH3 F9 : AH3 
B7; 3A3 CI : 3A3 C2; and AH4,E2 JH14 B3. 
15 .Another embodiment, of kit may combine one or more 

substances for digesting a sample to provide EcoRI fragments 
and a DNA probe as previously defined. A- further, embodiment 
of kit may comprise one or more- substances for digesting a 
sample to provide BamHI fragments and a DNA probe as 
20 previously defined. 

A vector (such as Bluescript (available, from 
Stratagene)) comprising a nucleic- acid sequence set out 
above; and a host cell (such as E. coli strain SL-1 Blue 
(available from Stratagene)) transfected or transformed with 
25 . f/ the vector are also provided,, together with the use of such 
a^ vector or a nucleic acid sequence set out above in gene 
therapy and/or in the preparation of an agent for -.treating 
or preventing a PKDl-associated disorder. . 
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Therefore, there ' is' further . provided a method" of 
treating or* preventing a ' PKD1 -associated disorder which 
method comprises administering to : a patient in need thereof 
a functional PKD1 gene to' affected cells in a manner that 
5 permits expression of FKD1 protein therein and /or a 
transcript produced from'V mutated' chromosome (such as the 
'^deleted" WS- 21 2 chromosome) which is " capable of expressing 
*f unctional-PKDl protein therein. 

As used herein, the term "hybridization" refers to 
10 conventional DNA/DNA' or DNA/RNA" hybridization conditions. 
For example, for a DNA or RNA probe of- about 10 - 50 
nucleotides, -moderately 1 stringent hybridization conditions 
are preferred and include 10X SSC, 5X Denhardts, 0.1% SDS, 
at 35 - 50 degrees for 15 hours'; for a probe of about 50 - 
15 -300 nucleotides; "stringent" hybridization conditions are 
preferred, and : refer r t6 - hybridization in 6X SSC, * 5X 
Denhardts; 0.1% SDS at 65 'degrees for 15 hours. ' 

-" The present invention further provides the use of PKD1 
protein or polycystin or a mutant or variant thereof having 
20 substantially the same biological activity there' as in 
therapy. In particular, to effect cell adhesion, 

recognition or communication for example to effect adhesion 
of cells to other cells or components of the extracellular 
matrix; effect communication ahd/or* interaction between 
25 epithelial cells and the basal membrane (whether in kidneys 
or otherwise ) ; assisting in development' of connective tissue 
such as assembly and /or maintenance of the basal membrane ; 
in signal transduction" between cells or cells and components 
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of the extracellular matrix; and/or to promote binding of 
cells carrying proteins such as. integrins .or carbohydrates 
to target cells. 

Accordingly, where it is preferred to administer the 
5 , polypeptide directly to a patient in need thereof, the 
invention further provides the. use of a PKD1 , protein or 
polycystin in the preparation of a medicament. Therefore, 
there . is also pr.ovided a pharmaceutical formulation 
comprising a^ PKD1 protein, functional PKD1 gene * and/or a 
10 transcript produced from, a mutated chromosome which is 
capable of expressing. . functional .PKD1 -protein,. in 
association with a pharmaceutical^ -acceptable carrier 
-therefor. 

The invention also features an inununoglobin, i.e.-, a 
15 polyclonal or. monoclonal antibody specific for an epitope of, 
PKD1 , which epitope is f ounci; in- the amino acjLd sequence 
presented in Fig. 15. 

The invention also features a method of assaying for 
the presence of PKD1 in a sample of mammalian, preferably 
20 human cells, comprising the steps of: (a) providing an 
antibody specific for said PKD1; and (b) assaying for the 
presence of PKD1 by admixing an aliquot from a sample of 
mammalian cells with antibody under conditions sufficient to 
allow for formation and detection of an immune complex of 
25 PKD1 and the antibody. Such method is useful for detecting 
disorders involving aberrant expression of the PKD1 gene or 
processing of the protein, as described herein. 

Preferably, this method includes providing a monoclonal 
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antibody specific for an epitope' that is - antigenically the 
same, as determined by Western blot assay , ELISA - or 
immunocytochemical staining, and substantially "corresponds 
in amino acid sequence to the amino acid' sequence of a 
5 portion "of PKD1' arid, having a molecular weight equal 'to less 
. than that of PKDl. ' " - ■ 

■' The invention thus also "features a kit 1 for detecting 
PKD1, the kit including at least one "package containing an 
antibody or idiotype-containing polyamide ^portion of an 

10 antibody raised to a synthetic polypeptide of this invention 
or to a conjugate of that* polypeptide bound to a carrier. 
An indicating group or label r is utilized to indicate" the 
formation of an immune reaction between the antibody and 
■PKD1 when the antibody is- admixed with tissue or cells. 

15- Further features will become 'more fully -apparent \in the 

following -description of * the embodiments of this invention 
" and from the appended claims. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Before .describing preferred . -embodiments of the 
invention: in detail, .the drawings will, brief ly be described . 
Figure la (top): A long range map of the terminal 
5. ■ region of - the short arm. of chromosome 16 showing the PKD1 
candidate region defined by genetic linkage analysis. The 
positions of selected DNA probes and microsatellites used 
for haplotype,; linkage or heterozygosity, analyses , are 
indicated. . Markers previously described in linkage 
10.. disequilibrium studies, are, shown in bold , (from: Harris, et 
"al., 1990; Harris, et al., .1991; Germino, et al., 1992; 
Somlo, et al., 1992; Peral, et al., 1994; Snarey, et al., 
1994). . * _ 

bottom): A detailed map of -the distal • part of the 
15 PKD1 candidate region showing: .= the area : , of 16pl3.3 ; 
duplicated in 16pl3.1 (hatched,); Cy. Cla I restriction sites; 
the breakpoints in the somatic* cell hybrids, N-OH1. and P- 
MWH2A; DNA probes and the TSC2 gene. The limits of the 
position of the translocation breakpoint found in family 77 
20 - (see b), determined by evidence of heterozygosity (in 77-4) 
z and PFGE (see c and text) is also indicated. . The contig 
covering the 77 breakpoint region consists of the cosmids: 
1, CW9D;. 2, ZDS5; 3, JH2A; -4, "REP59; 5, JC10.2B; 6, CW10III; 
1, SM25A; 8, SMII,\\9, "NM17.:. 
25 - Figure lb: Pedigree of family 77 which . segregates a 

16; 22 translocation; showing the chromosomal composition of 
- each subject.: Individuals 77-2 , and 77-3 have the balanced 
products of- the exchange. - and have PKD1; 77-4 is monosomic 
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for 16pl3.3~ >l6pter 1 and 22qll . 21-->22pter - and has TSC. 

Figure "ic: PFGE of : DNA from members of the 77 family: 
. 77-1 (!)■; 77-2 (2); 77-3 ( 3 ) ; ; 77-4 (4);- digested with Cla I 
and hybridised - wi'^th^ SMS: In addition ■ to the. normal 
5 • fragments "of 340 and- partially digested - fragment of 480 kb . 
a proximal breakpoint fragment v of approximately '100 kb 
" (arrowed) is seen in individuals, 77-2/* 77-3 and 77-4; 
concordant with segregation of the der( 16 ) chromosome. 
1 Figure d: FISH of the- cosmid CW10ITI ( cosmid 6; Figure 
10 la) to' a normal ; male metaphase. Duplication of this locus 
is illustrated with two. 'sites. of hybridisation on 16p; .the 
distal site (the PKD1 .region) is arrowed. The signal from 
the proximal site (16pl3.1) is stronger than that from '.the 
* distal, indicating that sequences homologous to CW10III are 
15 reiterated in 16pI3.1. ; • -»--*• -"• j > 

Figure 3a: " -A detailed *map of the 77 , translocation 
^ region showing the precise localisation of the 77 breakpoint 
and* the region that is duplicated in 16pl3.1 (hatched). DNA 
probes (open boxes); the transcripts, PKD1 and TSC2 (filled 
20 r boxes; with direction of;; transcription indicated by an 
arrow) and. cDNAs (grey .boxes) are shown below the genomic 
map.. The known genomic extent of each gene is indicated at 
the bottom of . "the diagram and the approximate genomic 
locations of each cDNA is indicated . under the genomic map. 
25 The positions of genomic deletions -found in PKD1 patients, 
- 0X875 and 0X114 , are also "indicated. Restriction sites: for 
EcoR I (E) and incomplete maps for BamH I~(B); Sac T (S) and 
Xba I (X) are shown J SM3 is a 2kb BamHl fragment :shown at 
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the 5 1 end of the gene. „ 

Figure 3b: Southern blots of BamH I digested DNA from 
individuals.:. 77-1 (1); 77, r 2.(2); and 77-4 (4) hybridised 
with: left panel,. 8S3 and right panel, 8S1 (see a). 8S3 
JT " detects a novel fragment on the telomeric side of the 
breakpoint . (12 kb: . arrowed), associated ti with the der(22) 
chromosome, in 77-2,_but npt 77-4; 8S1. identifies a novel 
fragment on the centromeric side of the breakpoint (9 kb: 
arrowed) -.associated with the der(16) chromosome - in 77-2 
10 and v 77-4. The telomeric - breakpoint fragment is also seen 
weakly with 8S1 ( arrowed .) - indicating - that the breakpoint 
lies in the distal, part of 8S1. The 8S3 and 8S1 loci are 
both duplicated;,, the, normal SamH ; I fragment detected at the 
I6pl3.3 site by these probes is 11 kb (see a), but a similar 
15 . sized fragment is also . , detected at . the 16pl3.1 site. 
Consequently, ,the breakpoint fragments are much fainter, than 
the normal (16pl3.1 plus 16pl3.3) band. 

Figure. 4a:. ; f?BP cDNA , 3A3, hybridised to a Northern 
blot containing about 1 ug polyA selected mRNA per lane of 
20 the tissue specific cell lines: lane 1, MJ, EBV-transf ormed 
lymphocytes; lane 2, -K562, erythroleukemia; lane 3, FS1, 
normal fibroblasts; lane 4>, HeLa, cervical carcinoma; lane 
5, G401, renal Wilm's tumour;; lane 6, Hep3B, -hepatoma; lane 
7, HT29, colonic adenocarcinoma; lane 8, SW13, adrenal 
25 carcinoma; lane 9, G-CCM-, astrocytoma.. A single transcript- 
of approximately 14 kb is seen; the highest level of 
expression is. in fibroblasts and in the astrocytoma cell 
line-, G-CCM. Although in this comparative experiment little 
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expression is seen in lanes 1, 4 and 7, we have demonstrated 
at least a low level of expression in these cell lines on 
other Northern blots and by RT-PCR (see later). 1 

Figure "4b: A Northern blot containing about 20 pg of 
5 -t-o-fcal RNA from the' cell line G-CCM hybridised with* cDNAs or 
a genomic probe which identify, various parts • of the PBP 
gene. Left panel, a single about 14 kb transcript is"' -seen 
with a cDNA from the single copy area, 3A3 : . Right" panel, a 
cDNA, 21P.9', that is homologous to parts of the region that 
10 " " is duplicated (JH12, JH8 arid JH10; see Figure 3a) hybridises 
to the PBP transcript and three novel transcripts; HG-A 
(about 21 kb), HG-B (about- 17 kb) and HG-C (8,5 kb>.\ A 
similar pattern of transcripts is- i seen with cDNAs and 
* genomic fragments tha"t hybridise to the area between «JH5~ and 
15 * J JH13, with the exception' of the JH8 area. - Middle panel, JH8 
' hybridises to the transcripts PBP, HG-A-and HG-B but not to 

HG-C. - * ■ * 

■ 1 Figure 4c: A Northern blot : of 20 pg total fibroblast 
RNA from: normal control (N); 77-2 (2); 77-4 (4) hybridised 
20' with 8S1, which contains the 16; 22 translocation breakpoint 
(see Figure 3). A -transcript of about 9 kb (PBPt77) is 
identified in the two patients with this translocation but 
not in the normal control. ■ PBP-77 "is a chimeric PBP 
transcript formed- due to the -translocation and is'not seen 
25 in 77-2 or 77-4 RNA with probes which map distal to the. 
breakpoint. - ' *- * '..*.. 

- Figure 5a: * FIGE- of DNA from: normal* (N) and " ADPKD 
patient 0X875 ( 875 )v digested with EcoR " I and hybridised 
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with, left panel, CW10; middle panel, JH1. Normal fragments 
of. 41 kb (plus - ( a 31 kb fragment from the 16pl3.1 site), 
CW10, and 18 kb, JHI, are identified with these probes; 
0X875 has an additional 53 kb band ,( arrowed ) . The EcoR I 
5 site separating these, two fragments - is ..removed by the 
deletion (see Figure 3a). The right panel. shows a Southern 
blot of BamH I digested .DNA (as above) hybridised with 
1A1H.6. A novel fragment of 9.5 kb is seen in 0X875 DNA, as 
well as the normal 15 kb fragment. These results indicate 

10 that 0X875 has a 5.5 kb deletion; its. position was 
determined more precisely by. mapping relative to two Xba I 
sites which flank the deletion (see figure 3a). ' 

Figure 5b: Northern blot .of total fibroblast RNA f as 
(a), hybridised with the cDNAs, AH4, 3A3 and AH3. A hovel 

15 transcript (PBP-875) of about.. 11 kb is seen with AH4 (the 
band* is reduced in intensity . because, the probe - is partly 
deleted) and AH3 (arrowed), which flank the deletion, but 
not 3A3 which is entirely deleted ( see figure 3a). The 
, transcripts HG-A, HG-B and HG-C, from the duplicated area, 

20 are seen with AH3 .( see: figure 4b). 

Figure 5c: Left panel; FIGE of DNA from: normal (N) 
and ADPKD patient 0X114 (.114) / .digested with EcoR <I and 
hybridised with CW10; a novel fragment of 39 kb (arrowed) is 
seen in 0X114. Middle panel; DNA, as above, plus* the normal 

25 mother (M) and brother (B) of 0X114 digested with BamH I and 
hybridised with CW21. A larger than normal fragment of 19 
kb- (arrowed) .was detected in 0X114 but not other family 
members due : to . deletion ' of a BamH I site; together these 
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results are consistent with : a- 2 kb deletion 1 ( see Figure 3a). 
Right panel; RT-PCR of RNA, as -above; with primers flanking 
the 0X114 deletion -( see Experimental- Procedures ) J A novel 
fragment of 810 bp ('arrowed) is seen in 0X114, indicating a 
5 deletion of 446 -bp in the PBP transcript. 

Figure 5d: - RT-PCR of RNA from: ADPKD patient 0X32 
• (32) plus the probands, normal 'mother ( M ) and affected 
father (F) and sibs (1) and (2) using the C primer pair from 
3A3 (see Experimental Procedures)/ A novel fragment of 125 
10 bp is detected in each of the affected individuals. 

- Figure 6:, Map of the; region containing, the TSC2 and 
PBP genes showing the area deleted in patient WS-53 and the 
position of the. 77 translocation breakpoint. Localisation 
of the . distal end of. the WS-53 ': deletion was described 
15 (European Chromosome 16 Tuberous Sclerosis Consortium; 1993) 
and we have now localised the- proximal . end. 'between SM6- ahd 
JH17-:vThe size of the : aberrant Mlu . I fragment in WS-53, 
detected by. JH1 and JH17, is 90kb and these probes, lie on 
adjacent Mlu 1 fragments of 120kb and 7 v 0kb, respectively. 
20 Therefore the WS-53 deletion is about lOOkb. Restriction 
sites for: Mlu I (M) ; .-Nru' I (R); Not I (N); and partial 
maps for Sac II (S) and BssH II (H)' are shown. DNA-probes 
(open boxes) and the TSC2 and PBP transcripts ^( filled boxes) 
are indicated below the line with their known genomic 
25 extents (brackets). The locations of the microsatellites 
KG8 and SM6 are also indicated. . 

Figure 7: The partial nucleotide sequence (cDNA) of 
the PKD1 ..transcript extending 5631bp to the 3' end of the 
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gene- The corresponding predicted v protein (also shown in 
SEQ ID NO: 4 : ) is shown below the sequence and extends from 
the' start of the nucleotide sequence.- The GT- repeat , - KG8, 
-is in the 3-* -untranslated region, between 5430-5448 bp. 
5 ; This sequence corresponds to GenBank Accession No. L33243 
and is shown in SEQ ID NO: 3:.. • .* 

Figure 8: The sequence of . : the , probe 1A1H0 : 6 (also 
- 'shown in SEQ ID. NO: 5:). :. *; : . - 

Figure 9: The sequence (SEQ ID NO: 6:) -of. the probe 
10 CW10 which is about 0.5kb. 

FigOre 10: The largfer partial nucleotide sequence (SEQ 
" ID -NO: 1:) of the PKD1 transcript ( cDNA ) extending from bp 
2 to 13807bp to the 3 1 end of- the gene together with the 
. corresponding' predicted protein (also shown in SEQ ID NO: 
15 2: ) . This larger partial sequence encompasses the .( smaller ), 
partial sequence of Figure 7 from amino acid ho. .2726 in SEQ 
ID NO: 3: and relates to the entire PKD1 gene sequence apart 
from its extreme 5' end.* 

Figure 11: A map 'of the 75bp intron amplified by the 
20 primer set 3A3C insert at position 3696 of the 3' sequence 
■ showing the positions of genomic deletions found in PKD1 
patients 461 and 0X1054. 

Figure 12: A map of the region of chromosome 16 
containing- the ' TSC2' : and PKD1 genes showing the areas 
25" - affected in patients" WS-215, WS-250, WS-212, WS-194, WS-227 
- and WS-219-; - also WS-53 ( but cf . Figure 6)' . Genomic sites 
for the enzymes' ~Mlul *(M), Clal (C), Pvul (P) and Nrul (R) 
are shown. Positions of * single-'cbpy probes and "cosmids -used 
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to screen for deletions are shown below the line which 
represents ' about 400kb ' of genomic DNA. '.The, genomic 
distribution -of the approximately 45kb TSC2 gene and known 

* extent of" th'e PKD1 gene are indicated above. The hatched 
5 area represents >an * about 50kb .region which is duplicated 

more proximally on chromosome 16p. . » 

■Figure 13 is a genomic map of the PKD1 gene-- (Top) A 
restriction map of the genomic area 'containing .the PKD1 gene 
showing sites for Bam.Hl(B), EcoRI(E) and partial maps for 

10 Xbal (X) and Hind III(H), and the duplicated area (hatched). 
The position of genomic clones and- the cosmid JH2A are shown 
above. the map (open boxes) . .The positions of the 46 exons 
of the - PKD1 ■ gene are - shown below, the map (solid boxes, 
translated areas ; open boxes , untranslated regions ; * UTRs ) . 

15 v Each 5th exon. is numbered and the: direction of transcription 
.* arrowed- . . The area .sequenced" in^ Pigs.' 7 and. 10 is^bracketed 

* and .the ."approximate .location .of the 3 ! end of the TSC2 gene 
is shown on the left (dashed -line and hatched box). 
(Bottom) The* cDNA contig covering the PKD1 transcript. The 

20 cDNAs are: 1/ revl; 2, S13;3, S3/4; 4, Sl/3;5, GAP e; 6, GAP 
* d; 7, GAP g; 8, GAP a (.see table 2 for details); ?, A1C; 10, 
AH3; 11, 3A3; 12, AH4. 

Figure 14 (a) (Top): \ Map of the genomic BamH I 
fragment, SM3 which contains the- CpG island- at the 5' .end of 
25' the 'PKD1 gene, showing the probe CW45 (open box).. Genomic, 
restriction sites for the methylatiQn sensitive enzymes: 
, SacII (-S), Nptl- (N), Mlul . ( M ) and, BssHII (H)_ are 
illustrated. The . approximate position of the DNasel 
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•hypersensitive site is also shown (large arrow), plus the 
location . of the first axon including the proposed 
transcription start site (small arrow), the 5'UTR (open box) 
. and the translated region (solid bar). (Bottom) The GC 
5: content across the area >s plotted with a window size of 50 
nt. A peak of GC content of over 80% is seen in the area of 
the transcriptional- start _ si.te and the .first exon. A 
corresponding lack of GpG suppression was also found with an 
■ average CpG/GC ratio of 0.84 between 800-1,800 bp. 
10 ' Figure 14(b). Analysis of -DNase I hypersensitivity at 

■•" the PKD1 CpG island. DNA isolated from HeLa cells treated 
with an increasing amount of DNase . I .( left to right;., first 
lane contains no DNase 1), digested with BamH I and 
- hybridised with CW45-. A fragment about 400 bp smaller than 
15 the restriction fragment is seen with increasing DNase 1, 
indicating a hypersensitive.. n site .as shown in (a). SM3 is 
within the duplicated : area.- and so both the PKD1 and HG .loci 
are assayed together . * :, The degree of ..DNasel digestion seen 
at the end of the assay indicates that cleavage occurs at 
20 the PKD1 and HG loci.. 

Figure 15 provides the sequence of the PKD1 transcript 
and predicted protein;.. The full sequence of 14,148 bp from 
the transcription start site to. the poly A tail is . shown. 
The probable signal sequence of 23 ; amino acids is shown 
25" after the first methionine (underlined) plus the cleavage 
site (arrow). '""••The • predicted transmembrane. (TM) domains 
(double underlined and. numbered ) and. N-linked glycosylation 
sites (asterisk) are indicated. The position of a possible 



SUBSTITUTE SHEET (RULt 26)- 



WO 95/34649 




PCT/GB95/01386 



hinge sequence is underlined and tyrosine kinase- and protein 
kinase C phosphorylation sites marked with a box and circle, 
respectively. ~ * . c . 

Figure 16(a). The leucine rich repeats ( LRRs ) found in 
5 the PKD1 protein (72-125aa) are ; compared with each- other; and 
to the LRR 'consensus (Rothberg, 1990; Kobe,: 1994); a, 
aliphatic. A total of just over 2 full repeats are present 
in PKD1 but- they have been arranged into 3 incomplete 
repeats to show their 7 similarity to those, found -in, slit 

10 (Rothberg, 1990). The black'bbxes show identity to the LRR 
consensus and shaded : boxes- other regions of similarity 
between the repeats which' have -also been noted in other LRRs 
(Kobe, '1994 >. -c;, ;, . . 

^Figure 16(b). The amino flanking region- to -the LRR in 

15 the PKD1 protein (33-71aa> is compared' similar regions from 
a variety of other proteins.' ~ Black boxes shown -identity 
with : the consensus (adapted from [Rothberg, ..199P #1126] ) and 
shaded boxes conserved amino acids. The different types of 
residue indicated in the consensus are: a, as above; p, 

20 polar or turn-like; h, hydrophobic. The listed proteins, , 
with the species and Protein Identification Resource no. 
(PIR) shown in brackets, -are: OMgp, oligodendrocyte myelin 
glycoprotein (Human, A34210);. Slit. (Drosophila; A36665); 
Chaoptin (Drosophila; A29943); GP-:IB Beta,. platelet 

25 glycoprotein lbB chain- ( Human; A31929); PgX? proteoglycan- 1 
(mouse; 520811); Biglycan (Human; A40757X;. Trk (Human; 
A25184') and LH-CF,* l'utropinchoriogonadotrophin receptor 
*(Rat; A41343). 
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Figure 16(c). The carboxy flanking region of the LRR 
repeat from the PKD1 protein ( 126-180 r aa) compared to 
similar' regions in other proteins .and a consensus accepted 
from [Rothberg.,. ,1990 #1126]. The shading and amino acid 
5 types are as above. The-. proteins, not described above are: 
Toll (Drosophila; A29943) and GP IX, platelet glycoprotein 
IX (Human; A46606). 

Figure 17 is a sequence comparison of the C-type lectin 
'domain. The PKD1 lectin domain (403-532aa) is compared to 
10 those of: BRA3, acorn barnacle, lectin J JC1503 ); Kupffer cell 
carbohydrate-binding receptor (Rat; A28166), CSP, cartilage 
-specific protoglycan .. (rBovine;. A27752 ); Agp; 
asi'aloglycoprotein receptor/. (Human; 55283), E-Selectin 
(Mouse; B42755) and glycoprotein gpl20* (Human; A46274). 
15 Black squares show- identify- with ; the consensus and • shaded 
boxes conserved • residues. : Amino "acid types .are: Very 
highly conserved residues" are-shown in. bold in the consensus 
• which is adapted from Drickamer 1987, Drickamer 1988. t 

Figure 18 is a sequence analysis of the- Ig-like repeat. 
20 The 16 copies of the PKD1 . Ig-like ' repeat (PKDI 273-356 aa; 
PKDII-XVI, 851-2145aa) are compared to each other and to: 
V.a. colAi, and C.p.- colA collagenases of Vibrio 
■ alginolyticus (S19658) and Clostridium perfringens (D13.791), 
respectively; Pmell7,- melanocyte specific glycoprotein 
25 " (Human; A41234) , FLT4 ' Ig repeat IV of fms-like tyrosine 
kinase 4 (Human; X68203 ) , .GaVPT,; Ig repeat I of target 
protein of the calcium vector -protein (CAVP) { amphioxius ; 
P05548). -black boxes "shown amino acids identical in more 
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than 5' repeats and* shaded' boxes related residues... An Ig 
consensus determined from Harpaz -et al. 1-994 and:Takagi et 
* al. ' 1990' .is 'shown in*- the -symbols: -a, aliphatic;. • h, 
hydrophobic,^ s, small' 'and b, base with— the - predicted 
5 positions *of the '6-strand's indicated below. ■ The .PKD repeat 
IV h£s : an extra repetition of 20 aa in the>;centre of -the 
repeat while all of the others are between 84-87 .aa;' 
' ' Figure 19 reveals-type Ill-related fibronectin domains. 
The four fibronectin- related: domains from the PKD1 protein 
10 (2l69-2573aa)* are compared to similar domains in: Neuroglian 

- (Drosophila; A32579); ,L1,* neural recognition molecule LI 
(X59847); Fll, neural ; : cell recognition molecule. Fll 

'(X14877); TAG -1, transiently : -.'expressed, agonal surface 
glycoprotein- 1- (Human,; S28830); F3, Neuro-1- antigen (mouse; 
15 : . S05944 ) ; NCAM, -neural^celT .adhesion.molecule ( Rat ; X06564 ) ; 

- DCC, deleted- in -colorectal* cancer J Human; X76132 ) ; LAR, 
: Leukocyte-common antigen related .molecule (Human;- Y00815); 

HPTP, 6 protein tyrosine phosphate,. beta (Human; X54131 ) and 
FN, fibronectin ( Human; X02761 ) ... The consensus sequence is 

20 compiled from fiorh r and Doolittle (1993)., Kuma et. al . , ( 1993 ) , 
Bairon et al. (1992) and. Borh and Doolittle (1992). Black 
boxes show identity to highly: conserved. residues; and shaded 
boxes conserved' changes; . or similarity in-, less, highly 
conserved positions. The approximate positions of the S 

25 • strands are illustrated.- The fibronectin ^repeats -in the 
' PKD1 protein are. linked by .sequences of 27aa .(A-B), 22aa (B- 
C) and;-7aa (C-D) which are not shown. : - ^ - 

• Figure 20 presents a proposed model ,of the. ■ PKD1 



SUBSTITUTE SHEET (RULE 26) 



WO 95/34649 PCT/GB95/01386 

33 

protein, polycystin. The -predicted structure of the PKD1 
protein is shown. 



f 
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: " DETAILED -DESCRIPTION ' 

All references mentioned herein are listedrin • f ull • at 
the end of the description which are herein incorporated by 
reference in their entirety. Except where the context 
5 clearly indicates otherwise, references to the PBP gene, 
transcript, sequence, protein or the like can be read as 
referring to the PKD1 gene, transcript, sequence, protein or 
the like, respectively. 
A translocation associated with ADPKD 

10 A major pointer to the identity of the PKD1 gene was 

provided by a Portuguese pedigree ( family 77 ) with both 
ADPKD and TSC (Figure lb). Cytogenetic analysis showed that 
the mother, .77-2, has a balanced translocation, 46XX 
t(16;22) (pl3.3;qll.21 ) which was inherited by her daughter, 

15 77-3. The son, 77-4, has the unbalanced karyotype, 45XY-16- 
22+der( 16) ( 16qter-->16pl3.3: : 22qll . 21-->2qter ) and 
consequently is monosornic for 16pl3.3 — >16pter as well as 
for 22qll.21 — >22pter. This individual has the clinical 
phenotype of TSC (see Experimental Procedures); the most 

20 likely explanation is that the TSC2 locus located within 
16pl3.3 is deleted in the unbalanced karyotype. 

Further analysis revealed that the mother (77-2). and 
the daughter ( 77-3 ) with the balanced translocation . have 
the clinical features of ADPKD (see Experimental 

25 Procedures), while the parents of 77-2 were cytogenetically 
normal, with no clinical features of TSC and no renal cysts 
on ultrasound examination (aged 67 and 82 years). Although 
kidney cysts can be a feature of TSC, no other clinical 
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signs of TSC were identified in 77-2 or 77-3, making it 
unlikely -that the polycystic kidneys were due to TSC. We 
therefore , investigated the r . possibility that the 
translocation disrupted the PKD1 locus in 16pl3.3 and 
5 proceeded to identify and clone .the region containing the 
breakpoint. 

The 77 family was analyzed with polymorphic markers 
from- 16pl3.3. .Individual 77-4 was hemizygous for MS205.2 
and GGG1 , but heterozygous . for . SM6 and more proximal 

10 markers, locating the. translocation breakpoint between GGG1 
and SM6 (see Figure la). Fluorescence in situ hybridization 
(FISH) of a cosmid from the, TSC2- region, CW9D ( cosmid 1 in 
Figure la), to metaphase spreads showed that it hybridized 
to the der{22) chromosome of 77-2; placing the breakpoint 

15 proximal to CW9D and indicating -that 77-4 was .hemi-zygous for 

• this, region consistent with his .TSC.- phenotype. DNA,.from 
members of the 77 family was digested with .Cla 1, separated 

■ by PFGE and hybridized -with SM6 revealing a breakpoint 
fragment of about. 100 ;kb in individuals with the der(16) 

20 chromosome (Figure - lc). The small ■• size of - this novel 
fragment enabled the breakpoint to be. localized distal to 
SM6 in a region of Just 60 .kb ,( Figure la). A cosmid contig 
covering this region was- therefore- constructed (see 
Experimental Procedures-, for details ) . 

25 .. The translocation breakpoint lies within a region duplicated 

• elsewhere on chromosome 16p (16pl3.1) 

It is. noted hereabove that, the region between CW21 and 

• N54 (Figure la) was- duplicated at a more- proximal site on 
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the short "arm of chromosome' 16' (Germino, et- al ; , 1992; 
European Chromosome 16 Tuberous Scierosis Consortium, 1993). 
Figure 2 shoe's " that a cosmid; CWlOIIi;' -from the duplicated 
region hybridized to two points on 16p; the distal PKD1 
5 " region and a proximal' site positioned ; in l : 6'pl3.1-. : * The 
structure of the duplicated area is complex with each 
fragment* present once in 16pi3.3 re-iterated two-four times 
* in 16pl3.1 (see Figured)*. ' Cosmids spanning the duplicated 
area in i6pl313 were iubcloned * ( see Figure- 3a and 

10 Experimental Procedures for details )-• and 1 a - restriction map 
was generated. *- A genomic map of - the PKD1 region - was 
-constructed ' using a- j radiation hybrid, Hyl45.19 which 
contains the distal portion of 16p but not the : duplicate 
site in 16pI3.1. ; -:^\ - - - • ', \ - 

15 - - ^ To localize the 77 translocation breakpoint, subclones 
from the target region 1 were' hybridized to 77^2 DNA; digested 
with ' Cla * ~I and separated by PFGE>. ■ ■ Once probes - mapping 
across 'the breakpoint were identified they were hybridized 
to conventional Southern blots of>77 family DNA. Figure 3b 

20 shows that novel BamH I : fragments were * detected . from the 
"centromeric and telomeric side of the breakpoint, which* was 
localized to the distal part of the - probe 8S1_ (Figure 3a). 
Hence, the balanced translocation was not associated, with a 
substantial deletion, and* the breakpoint ' was located 'more 

25 than - 20 kb proximal to the TSC2 16cus r (Figure 3a). These 
results supported the hypothesis that' polycystic- kidney 
disease in individuals with'the balanced translocation (77-2 
; and 77-3)- was not due to disruption of the TSC2 gene, but 
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- indicated that a. separate gene mapping just proximal to 
TSC2 , was likely to be the PKD1 gene. 

The polycystic breakpoint (PBP) gene .is disrupted by the 
translocation. . 

5 - Localization of the 77 breakpoint identified a precise 

. region in which to look for a candidate or the PKD1 gene. 
. During the search for. the TSC? gene we identified other 
transcripts .not associated with TSC including a large 
transcript (about 14 kb ). partially represented in the cDNAs 

10 3 A3 and AH4 which mapped to the. genomic fragments CW23 and 
CW21 (Figure 3a). The orientation of the gene encoding this 
transcript had been determined by . the identification of a 
polyA tract in the cDNA, AH4: the 3 f end of this . gene lies 
very close to -the TSC gene., ln- A a tail, to tail orientation 

15 (European Chromosome >16 . Tuberous. Sclerosis Consortium, 
1993). To determine whether, this gene, crossed, the 
translocation breakpoint genomic probes from - within, the 
duplicated area and flanking the breakpoint were hybridized 
to .Northern blots. ■ Probes from both., sides * -of the 

20 breakpoint, between JH5 and -JH13 identified the 14 kb 
transcript ( Figure 3a • and see.. below for . details ) . 
Therefore, this gene, called 3A3, but not designated, the PBP 
gene extended over the 77 breakpoint and consequently was a 
candidate for the PKD1 gene. A walk was initiated to 

25 . increase- the extent of the PBP cDNA.contig and several new 
cDNAs were identified using probes from the single copy 
(non-duplicated) region (see Experimental Procedures for 
details ) . A cDNA contig was constructed which extended 
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about 5.7 kb, .including about 2 kb into' the area that is 
duplicated (Figure 3a). 
Expression of the PBP gene 

Initial studies of the expression pattern of the" PBP 
5 gene were undertaken-with cDNAs that map entirely within the 
single copy region' (e.g! AH 4 and 3A3 j . Figure' 4a shows that 
the about 14 kb tran'script was identified by 3A3 in various 
tissue-specific cell lines. From this and other Northern 
blots we concluded that the" PBP gene was expressed in all of 

10 the cell lines tested; although often at a low -level. The 
two cell lines which showed- the highest level of expression 
were' fibroblasts and a dell line * derived from an 
astrocytoma; G-CCM.' Significant levels of expression were 
also obtained in cell' lines' derived from 1 kidney (G40L) and 

15 liver ( Hep3B)^ - Measuring 'the expression of the PBP gene in 
tissue samples by Northern blotting proved difficult because 
such a large- transcript is susceptible to minor* RNA 
degradation-. ' 1 However-"/ initial -'resurts with an '* RNAse 
- 'protection assay/ using' a region of the gene, located in the 

20 single copy area (see Experimental Procedures), showed a 
moderate level of expression of the PBP gene in tissue 
obtained from normal and polycystic kidney (data not shown). 
The widespread expression of the PBP is consistent with the 
systemic nature* of ADPKD. - a * 

25 Identification of transcripts that are partially homologous 
to the PBP transcript - * 

New -cDNAs were identified • with the genomic fragments, 
JH4 and • JH8,' that map to the 7 duplicated' region (Figure 3a 
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and see Experimental Procedures). However, when .these cDNAs 
were hybridized r to Northern blots a. more complex pattern 
than that seen with 3A3 was observed. As well as the "14 kb 
PBP transcript, - t three ..other, partially homologous 
5 . transcripts were identified designated homologous t gene-A 
(HG-A; "21 kb), HG-B ("17 kb ) and HG-C (8.5 kb). Figure 4b). 
There were two possible explanations for these results, 
; either the HG transcripts were alternatively spliced forms 
of the PBP gene, or the.HG transcripts were encoded by gene 
10 located in 16pl3.1. . To determine the genomic location of 
"the HG loci a fragment from the 3 1 end of one HG cDNA (HG- 
4/1 ; 1 ) t was isolated. HG-4/1.1 hybridized to all. three HG 
. transcripts,, but not to- the PBP transcript and on -a hybrid 
panel it mapped to 16pl3 . 1 ; ( nor the PKD1 area).. These 
15 results show that all the HG transcripts are related to each 
: other; outside the region .of homology, with the -PBP> transcript 
and that the HG loci map to the proximal site (16pl3.1). 
An abnormal transcript associated with the 77 translocation 
As the PBP gene was transcribed across the > region 
20 disrupted by the 77, translocation breakpoint, in- a proximal 
to distal direction on the chromosome (see Figure 3a) it was 
possible that a novel transcript originating from the PBP 
prombtor -would be -found in this f amily . - Figure 4c shows 
that using a probe to the PBP transcript that mapped mainly 
25 proximal to ■ the breakpoint, a .novel transcript of 
-approximately. 9 kb:(PBP-77) derived from the der( 16 ) product 
of the translocation was detected. Interestingly,, the PBP- 
77. transcript appears to be expressed, at a higher level than 
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the normal PBP product*. These results confirmed ' that the 77 
translocation disrupts the PBP gene and supports " the 
hypothesis that this is the 'PKDl gene; ' ■ ' r 

Mutations of the PBP gene in other ADPKD patients * 
5 " To prove that the PBP gene is the defective gene at the 

PKDl locus, 4 we analysed" this region : for mutations in 
patients ' with typical ADPKD. -The 3* end of the PBP gene was 
most accessible to study as it maps outside the duplicated 
area. To screen this region- BamH 'I' digests .of DNA from 282 

10 apparently unrelated ADPKD patients were hybridized wi : th the 
probe' 1A1H . 6, (see Figure 3a) . in addition, a large EcoR I" 
fragment (41-kb) which contains a significant" proportion of 
the : PBP 1 gene was assayed by' field- inversion gel 
electrophoresis (FIGE) £h* ! 167 ADPKD patients, using the 

15 probe CW10.* ' Two - genomic rearrangements were identif ied' in 
ADPKD patients by these procedures;' each identified -by both 
methods. ' " : - .""..*■■ j * 

The first rearrangement was identified in patient 0X875 
( see Experimental Procedures* for clinical details ) who was 

20 shown to have a- 5 . 5 kb genomic- deletion without the 3' end 
of the PBP -gene, producing' a smaller transcript (PBP-875) 
(see Figures 5a, b and 3a 'for* -'details ) - *" This "genomic 
deletion results in a ~3 kb internal deletion of the 
-transcript with the ~ 500- bp adjacent to. /the polyA tail 

25 intact. In- this family linkage ofv ADPKD to chromosome. 16 
could not be f proven because although 0X875 has a positive 
'family history of "ADPKD there were no living, affected 
relatives. However, paraffin-embedded- tissue from her 
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affected father (now deceased.), was' available. , We 
demonstrated that this individual has the same rearrangement 
-as 0X875 by- PCR amplification of a 220bp fragment spanning 
the deletion (data not shown ). \ This result and analysis of 
5 two unaffected sibs 'of 0X875, that did not. have the 
deletion , showed - that- this . mutation ; was transmitted with 

ADPKD- ■ * 

The second rearrangement detected by hybridization was 
'a 2 kb genomic deletion within .the PBP gene ; in . ADPKD 
10 patient 0X114 (see Experimental Procedures for . clinical 
"details and Figures 5c and 3a). No abnormal PBP transcript 
was. identified by Northern blot analysis, - but using primers 
flanking the . deletion " : ( see- Experimental 'Procedures)., a 
shortened product was detected by RT-PCR (Figure- 5c ) . This 
15 was cloned and sequenced and- shown to have a frame^shift 
deletion of 446 bp { between - base 1 pair. 1746 and 2192 of the 
sequence shown in Figure 7). 0X114 is* the only: member of 
the family with ADPKD (she has no children) arid ultrasound 
analysis of her parents at age 78 (father) and 73 years old 
20 (mother) showed no evidence .of renal cysts. Somatic cell 
hybrids were produced from 0X114 and the deleted chromosome 
was found to be of paternal origin by haplotype analysis. 
The father of 0X114 is how deceased but analysis of DNA from 
the brother 1 ' of 0X114 (0X984) with seven - microsatellite 
25 markers from the PKD1 region (see Experimental Procedures) 
showed" that he shares the same paternal chromosome in the 
PKD1 region, as 0X114. Renal ultrasound revealed no cysts 
in 0X984 at age' 53 and no deletion was detected by DNA 
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analysis *( Figure 5c). Hence /-'the deletion in3PXli4.-is-a.de 
riovo event associated with- the development - of . ADPKD^ 
. Although '''it is 1 not * possible . -to . show . that the ADPKD* is 
: chromosome -16 -linked, the location of the PBP gene indicated 
5 that this is a de novo PKDI \mutation. v r : 

To identify more PKDI. associated mutations, single copy 
regions of the PBP gene were analyzed by RT-PCR using RNA 
isolated from lymphoblastoid cell lines , established from 
'ADPKD patients/ cDNA from 48" unrelated patients was 
10 amplified . with the primer pair . 3A3 C (see Experimental 
Procedures), and the product of 260 .bp was analyzed on an 
agarose gel. .,In one* patient; 0X32, an additional , smaller 
. product* (125bp) .was identified;, consistent with a deletion 
or splicing • mutation.. 0pC32 ; .comes ; from a ; large family in 
15 which the disease can r be traced through three- generations. 
: Analysis'. r of >RNA vf foin ; -two. affected- sibs- of 0X32 .and his 
*. parents showed, that the abnormal transcript . segregates- with 
PKDI (Figure 5d ) . . 

Amplification of normal genomic DMA with the 3A3 C 
20 primers generates a product of 418 bp; sequencing showed 
that- this region contains two small introns (5 ! , 75 bp and 
3', 83 bp) flanking a 135 bp expn.- The product amplified 
from 0X32 genomic DNA • was normal in size, excluding a 
genomic deletion. However, -.heteroduplex analysis of that 
25 DNA revealed larger heteroduplex . bands, consistent with a 
mutation within that genomic interval. - The abnormal, 0X32, 
RT-PCR product was cloned and- sequenced: this demonstrated 
that, although present t in genomic DNA, the 135. bp exon was 
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r missing from the- abnormal .transcript. Sequencing of 0X32 
genomic DNA .demonstrated a G-->C transition a$„ +1 of the 
splice donor -site following. the 135. bp exon, This mutation 
was confirmed - in all available, affected family members by 
5 .digesting amplified genomic DNA with the enzyme Bst NI: a 
site • is destroyed by the base substitution. , The splicing 
defect results in an- in- frame deletion of . 135 bp from the 
. PBP transcript (3696 bp to 3831 bp of ,the sequence shown in 
Figure 7). Together, the three intragenic mutations confirm 

10 that the PBP gene is the. defective gene at * the PKD1 locus. 
. Deletions that disrupt the TSC2 and the PKD1 gene 

, The deletion .called WS-53. disrupts both the TSC2 -gene 
and the PKD1 gene ( European-Chromosome 16 Tuberous Sclerosis 
Consortium, 1,993), although the full proximal extent of the 

15 . .deletion was not- determined . • . ,;Furt her study has shown that 
-the deletion- extends ~ 100 kb •( see : Figure 6 for details). and 
deletes most if not all* of;.the PKD1 gene.. This patient has 
TSC but also has unusually severe polycystic disease of the 
kidneys. Other patients with, a similar phenotype have also 

20 been under investigation. Deletions involving both TSC2 and 
PKD1 were identified and characterized in six patients in 
whom TSC was associated . with infantile polycystic . kidney 
disease.* As*. well, as the deletion in WS-53, jthose in WS-215 
and "S-250 also - extended proximally well beyond the known 

25 distribution of -PKD1- and probably delete the; entire gene. 
The deletion in 1 WS-194 ^extended- over the known extent of 
PKDl/'but not. much further proximally, while- the proximal 
breakpoints: .in WS.-219 and WS-227.1ay within PKD1 itself. 
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Northern analysis of case WS-219 with probe" JH8 which ''lies 
outside the deletion/ showed a reduced -level 'of the PKD1 
transcript 'but no evidence of aiVabnorxhally sized transcript 
(data not shown). "Analysis ; of samples from - the* clinically 
5 unaffected" parent's of patients WS-53," WS-215, WS-219, WS--227 
and WS-250 showed the deletions in these patients * to be de 
novo. The father of WS-194 was unavailable-' for study. 

In a further case ( WS-212) renal ultrasound showed no 
cysts at four years -of age but a- deletion was identified 

10 which removed the entire TSC2 gene and deleted an Xbal site 
which is located 42 bp ; *5- ! to* the -pblyadenylation signal of 
PKD1. To determine—the precise - position of the proximal 
breakpoint in PKD1; a 587bp probe 7 from the 3' untranslated 
region (3'UTR) was hybridized to Xbal digested DNA. A 15kb 

15 * "XbaL breakpoint * fragment - was : detected with* an 

approximately -equal 'intensity to the normal fragment of 6kb, 
indicating that most of the PKD1 3 '-UTR -was preserved on the 
mutant * chromosome.'- Evidence that- a PKD1 transcript : is 
produced from the deleted chromosome in WS-212 was obtained 

20 by 3* rapid identification of cDNA ends (RACE) with a novel, 
•smaller product generated from WS-212 * cDNA. 
Characterization of this- product showed that polyaderiylation 
occurs 546bp 5' to the normal position, within the 3'UTR of 
PKD1 (231bp 3' to the stop codon at 5073bp--of the described 

25 PKD1 sequence 14 ). A transcript with an intact open reading 
frame is thus produced' -from the deleted WS-212 chromosome. 
- It is likely, that a functional PKD1 protein in produced from 
this .transcript, explaining ^the lack of cystic- disease.. in 
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this- patient. The sequence preceding the novel site of 
polyA addition is: 

AGTCAGT AATTTA TATGGTGTTAAAATGTG ( A ) n . 
Although not . conforming . precisely to the consensus of 
5 AATAAA , it is- likely that part of this AT rich region acts 
as an alternative polyadenylation. signal .if,, as in this 
case, the normal signal is deleted (a possible sequence is 
underlined). t ;i 

. The WS-212 deletion is 75kb between SM9-CW9 distally 

10 - and the PKD1 3'UTR proximally. The WS-215 deletion Is 160kb 
between CW15 and SM6-JH17. WS-194 has 65kb deleted between 
CW20 and.CW10-CW36, WS-227 ;has a 50kb deletion between CW20 
and. JH11 and WS-219 has- a 27kb deletion between JHl-.and.JH6. 
The distal end of the WS-250 deletion is in CW20 . but the 

15 precise location of the proximal end is not known. However, 
the same breakpoint fragment of 3?0kb is seen with Pvul- 
digested' DNA using probes on adjacent Pvul .fragments, CE1B 
(which normally detects -a 245kb fragment) and Blu24 ( 235kb) . 
Hence this deletion qan be estimated ~160kbi b. PFGE 

20 -analysis of the deletion in WS-219'. Mlul digested DNA from 
a normal ■ control (N) and WS-219 probed- with the clones H2, 
JH1, CW21 and CW10 which detect an ~130kb fragment in normal 
. individuals. . CW10 also detects a- much smaller - fragment from 
the duplicated region situated more proximally on 16p. > A 

25 novel fragment of ~100kb.is seen in WS-219 with probes.. H2 
and CW10 which flank the. deletion in . this patient. JH1 is 
partially deleted but detects the novel band weakly. The 
aberrant fragment i.s not detected by CW-21, which is deleted 
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on the mutant chromosome. BamHl* digested * DNA : of: normal 
control (N) and WS-219 separated by conventional, gel 
electrophoresis and hybridized 'to probes-^ JH1 ~ and JH6 which 
' flank' the deletion. The same' breakpoint fragment of ~3kb* is 
5 seen with b'oth probes, consistent with ~a deletion of ~-27kb 
ending within the BamHl fragments seen by these probes. • 
Two further deletions • - 

In addition we have characterized two further mutations 
of this gene which were identified in typical PKD1 families. 
10 - In both cases the mutation is a deletion in the 75bp intron 
amplified by the- j primer pair 3A3C (European Polycystic 
Kidney Disease Consortium, 1994)". The deletions are' of 18bp 
and 20bp,- respectively in 'the patients 461 -and 0X1054. 
Although these deletions -db' not disrupt the highly conserved 
15 • sequences f lanking - the exonV intron boundaries, * they do 
'-result- in aberrant ; splicing " of the transcript. : In both 
leases/ two abnormal ^mRNAs. "are "produced, one larger " and one 
smaller than normal. Sequencing of -these cDNAs showed that 
the larger transcript includes the deleted -intron, and so 
20 has an in-frame insertion of f 57bp-in 461, while- 0X1054 has 
a f fameshi-ft"" insertion * of 55bp. The smaller transcript is 
■ due to activation of a cryptic splice site in -the exon 
preceding the deleted intron and results in ^'an • in-frame 
deletion of 66bp- : in both patrents. The -demonstration of two 
25 additional mutations of this gene in* PKD1" patients further 
confirms -that this is the PKD1 gene. Im- 
partial Characterization of the PKD1 gene - 

To characterize the PKD1 gene further, revolutionary 
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conservation was analyzed by ' zoo. .blotting ' . Using probes 
from the single cppy, 3* region . ( 3A3 ) and., from the 
duplicated area- (JH4, -JH8) the PKD1 gene was .conserved in 

•.other mammalian species, including horse, dog, pig and 
5 rodents ( data not shown.) . No evidence of related sequences 

. were seen, in chicken, frog or drosophila by hybridization at 
normal stringency- '.The degree of conservation was similar 
when probes from the single copy of the duplicated region 
were employed.. 

10 Although the full genomic extent of the PKD1 gene was 

not yet known, results obtained by hybridization to Northern 
blots showed that- it extended from, at least as far as JH13. 
Several CpG islands were localized 5\ of the known extent of 
the PKD1 gene ( Figure 6 ) , although . there . was no direct 

15 evidence that any of- these are. associated with this- gene. 

The cDNA contig extending 5631 bp to the 3' end of the 
PKD1 transcript was sequenced; where possible more than one 
cDNA was analyzed and in- all regions .both strands were 
sequenced -( Figure 7). We estimated that this accounts for 

20 "40% of the PKD1 transcript.. An* open reading frame was 
detected which runs from the. .5 * .end of the region sequenced 
and -spans 4842 bp, leaving a -.3' untranslated region of 789 
bp which contains the previously, described microsatellite, 
KG 8 (Peral, et al. / 19.94; Snarey, et al., 1994). A 

25 polyadenylation signal is present at nucleotides 5598-5603 
and a polyA tail was detected in two independent cDNAs ( AH 4 
and AH6) at position, 5620. Comparison with the cDNAs HG-4 
and 11BHS21. which* are encoded by genes - in the duplicate, 
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16pl3.1 region/ show that 186'6 bp af the 5 ' " end of - the 
partial PKD1 sequence shoWn in Figure 7 lies within the 
duplicated area. 'The predicted amino acid sequence from the 
available open "reading frame extends 1614 residues, and is 
5 shown in Figure 7. A search of thb- swissprot- and NBRF- data 
bases with the available "protein- sequence, using the Blast 
program '(Altschul, et- al., i990) identified' only, short 
regions of similarity {notably, between amino-acids 690-770 
and 1390-1530) to a diverse group of proteins.; no highly 

10 significant areas of homology were recognized.,- The 
importance of the short' regions of- similarity is unclear as 
the search* for protein motifs -with the -ProSite Program. did 
- not identify .any recognized functional protein . domains 
within the PKDi gene. ;1 • , 

15 ; The* test- of identifying and characterizing the . PKDI 

gene lias been - more 1 difficult than foi~ other disorders 
because more than three quarters of:-the gene is embedded in 
a region of DNA that is duplicated elsewhere on chromosome 
16. This -segment of 40^50 kb of DNA, present as a single 

20 copy in the PKDI area (16pl3.3), is re-iterated as several 
divergent copies in the more proximal region, 16pl3.1. This 
proximal site - contains three gene' loci (HG-A, -B and -C) 
-that each produce, polyadenyl-ated- mRNAs 'and. share substantial 
homology to the PKDI gene; it is not known whether these 

25 partially* * homologous * transcripts -are translated into 
functional • proteins.* ''"_**-.. 

Although - gene- amplification is" "known as * a major 
mechanism for creating protein : diversity during.'. -evolution, 
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the discovery of a human disease locus embedded . within an 
area duplicated relatively recently is a new .observation. 
• in this case because of the recent nature of the reiteration 
- the whole duplicated genomic region retains a high level of 
5 homology, not just the exons. The sequence of events . 
■ leading to the duplication and which sequence represents the 
original gene locus are not yet clear. - However,, early 
-- evidence of homology of . the 3 '.. ends of the three HG 
transcripts which are different from the 3' end of the PKD1 
10 gene indicated that the loci in I6pl3.1 have.probably arisen. 
" by further, reiteration of sequences at this site, after it 
separated from the distal :locus 

To try to. overcome the duplication . problem we employed 
an exon linking approach using. RNA isolated from a radiation 
15 hybrid, HY145.19, that contains- just the PKDl .part of 
- chromosome 16, and not - the duplicate • site in. 16pl3. 1- 
Hence, this hybrid produces transcripts from the r PKDl 'gene 
but not from the homologous genes. (HG-A, HG-B and.HG-C).. We 
have also sequenced much of the genomic region containing 
20 the PKD1 gene, from the cosmid JH2A, and have sequenced a 
• number of cDNAs from the HG locus. To determine the likely 
position of PKD1 exons in . the- genomic DNA we compared HG 
cDNAs , (HG-4 and HG-7)- to the .genomic .sequence. We then 

• designed primers with sequences corresponding to the genomic 
25 DNA, to regions identified by the- HG exons and employing DNA 

• generated from the hybrid HY145.19, we amplified sections of 
the PKD1- transcript. The polymerase . Pf u : was used, to 
minimise incorporation errors. These amplified fragments 
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sequence is' shown in Figure- 1G is made~ up of (3- 1 -5 ! ) the 
original' 5.7 kb ' of sequence* shown in Figure 7, and the 
cDNAs: gap a 22 (890 bp gap gamma (872 bp); a section, of 
5 genomic DNA from^the clone JH8 (2,724 bp) which corresponds 
to a large exon, S1-S3 (733 bp), S3-S4 (1,589 bp) and S4-S13 
' (1,372 bp). Together these make" a cDNA of 13,807nt. .When 
these" cDNAs from the PKD1 contig were sequenced an open 
reading frame was found- to run from the start of the contig 
10 to the 'stop codon, a region of 13,018 bp. . The . predicted 

- protein encoded by >thet. PKD1 transcript .is also shown in 
Figure 10 and has 4,339 amino acid residues. . - 
Cloning a full length PKD1 cDNA 

cDNAs -known to- originate -fromr sthe PKD1 or HG 
15 " transcripts show on averagfe a>'sequence, divergence of less 

- than 3% I 1 - Consequently, - although many ;cDNAs were Identified 
' by hybridisation of various PKD1 genomic probes to cDNA 

• * libraries-, it proved difficult to differentiate genuine PKD1 
clones from those -of the HG transcripts. For . this reason a 
20 novel strategy was employed, to clone the PKD1 . transcript . 

To obtain 'a template of genomic sequence of the • PKD1 
gene, clones which contain the transcribed region, : JH6 and 
JH8-JH13, were -sequentially truncated and sequenced., These 

- clones Were -isolated from; the cosmid JH2A, .which extends 
25* -into the single : copy area containing the 3 f portion of the 

PKDl-gene (figure 13) and hence represents the PKD1 and not 
the HG loci.* As a result of this analysis a- contig, of about 
18 kb of genomic- sequence was generated,, which was 
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ultimately found to-; encode ->95% of the unsequenced portion 
of the PKD1 transcript. - _ 

■ A number of HG cDNA clones identified by, the DNA probes 
jH8 or JH13 (including HG-4, HG-7C and 13A1) were sequenced. 
5 Clones identified by -JH8 were chosen* because this genomic 
area is, duplicated fewer times- -than the surrounding .DNA , 
with., only, the HG-A and HG-B transcripts (not HG-C) 
, homologous to this region.,. The comparison of these cDNA and 
genomic sequences ' showed a characteristic intron/exon 
10 pattern and we concluded that the exons highlighted in the 
genomic sequence were likely to be. exons of the PKD1 gene. 
To prove this, pairs. of primers matching the sequence of the 
putative PKD1 exons and spaced 0.7 - 2, kb -apart in the 
; proposed transcript, were synthesized. Employing RNA from 
15 . a radiation hybrid, HY145.19, that- contains .the, PKD1 but:not 
■: the -HG . loci, . PKD1 specific -cDNAs. were-, amplified by, RT-PCR 
arid cloned ( see 1 Experimental ; Procedures for- details ) . In 
this way, a, number of overlapping cDNAs spanning the PKD1 
transcript, for the cDNAs at the . 3' end : to those homologous 
20 to JH13,were cloned (Figurecl3). - - . 

Analysis of a further cDNA, HG-6 showed that a short 
region' (-100 bp) of HG-6 lay - 5*' to the sequenced genomic 
.region and this was located by hybridisation to- the genomic 
clone SM3 (figure 13); SM3 .was subsequently sequenced. The 
25 position :i of the cDNA in SM3 was identified and the possible 
5/ extent, of this exon was .determined * in . the genomic 
sequence; and in- frame stop cpdon was identified hear the 3' 
- end of the ,exon._ This exon lay at* a CpG -island (described 
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hereinafter) suggesting, along With the presence of the stop 
codon, that this may be the first exon of- the PKD1 gene, to 
determine the likely transcriptional start site the method 
of primer extension" from 'three different oligos within the 
5 first "exon was 'employed (see Experimental Procedures). In 
all cases/ a transcriptional start* was identified at -the 
same G nucleotide and showed th£ first exon to-be 426 bp. 
The structure of the - 'PKD1 transcript was confirmed by a 
final exon link, revl which starts 3 bp 3' to the proposed 

10 transcriptional start ( See figure 13 and Experimental 
Procedures for 1 details); ■ . 

The intron/exbn structure of the. PKD1 gene 

Sequencing "the cDNA contig revealed a total sequence of 
14, 148' bp which extends- over approximately 52 bp of genomic 

15- sequence from SM3 to- >BKS5* ( Figure -13 ) . !We were able to 
determine" the introh/exon -structure iof much of the gene by 
direct comparison between the-cDNA andigenomic sequence. In 
* the • 3 ' : . region of the ^gene : ( JH5-BFS5 ) , a partial genomic 
sequence was obtained at intron/exon borders by sequencing 

20 the corresponding genomic clone from exonic primer. 
The PKD1 CpG island 

■•. The 5' end of the gene' lies at CpG island SM3. SM3 is 
located entirely within the -duplicated region, but this 
clone was isolated from* the -cosmid SM11- which extends 

25' through tKe duplicated area into the* proximal "flanking 
single -copy region and* therefore *is known to originate from 
this areaV" Figure 14 shows a.- map of "the PKD1 CpG island 
including genomic sites for several- methylation sensitive 



SUBSTITUJESHEET(RULE26) 



WO 95/34649 PCT/GB95/01386 



53 

enzymes, the location . of the first exon and the GC content 
across the island.. Evidence that the enzyme sites in the 
PKD1 region (and not just the HQ area) digest, was ' obtained 
by pulsed field gel electrophoresis with the enzymes Mlu I. 
5 Not II and BssH II using probes outside the duplicated area. 
Digestion of the Sac II sites and confirmation of the Not I 
site was made with a panel of somatic cell hybrids which 
either, contain just the HG (P-MWH2A). or just the PKD1 locus 
(Hyl45.19). These results showed that the Sac II and Not I 
10 ; sites digest in both sets of hybrids (data not shown), 
indicating that this region is a CpG island in the HG as 
'well as the PKD1. area, •; . Further proof that this is the 
likely position of a functional promoter was obtained by 
analysis for . DNAase, 1 hypersensitivity. A DNAase 
15 hypersensitive site in the region 5 •'. to the. transcription 
' start site in- SM3 was detected ( figures 14a and b). ^ 
Analysis of the PKD1 transcript 

' -Analysis of the- sequence, shows an open reading frame 
running from the start of the sequence to position 13,117 bp 
20' (Figure 15).- Detailed sequencing of the genomic region 
containing the 3 ' portion of the gene revealed two extra Cs 
: at positions 13,081-2 (Figure 15). An in-frame start codon 
which is consistent with the Kozak consensus was detected at 
position 212 bp; just 3 '. ■ to the stop codon in the 5'UTR. 
25 Analysis for a signal sequence cleavage site using the von 
Hinge (von Hinge 198.6.) algorithm .showed a high probability 
- of a hydrophobic signal sequence with .cleavage at amino acid 
23" (see Figure .15). The total length of the predicted 
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protein is 4302 aa with a calculated- molecular mass -after 
' excision of the signal peptide of 460 kD .and an estimated 
isoelectric ^point of ' 6.26: : ' However , this. . may be an 
underestimate' of the total- mass of the .protein as , many 
5 potential sites' f or WinftSd ' glycosylate are . present 
( Figure 15 ) . 

Homologies with the PKD1 protein 1 

The predicted PKbl pfotein was analysed for homologies 
with know proteins in the SwissProt and NBRF databases -using 
10 the BLAST Altschul et 'al' 1990 V and FASTA algorithms. This 
analysis revealed two clear homologies and- also a number of 
.' other potential similarities which were studied on- detail. 
Leucine rich repeat L 

Near the 5' end 7 of the PKD1 protein- -is a region of 
15 "leucine rich- repeats ( LRRs ) LRRs .are a highly .conserved . 
motif usually of v 24 -residues- -Vith precisely spaced leucines 
(or other aliphatic amino • acids ); and , an : asparagine at 
' ' position 19 ( Figure 16a and reviewed in- Kobe -and Reisenhofer 
(1994))." Two complete LRRs plus, a partial ■ repeat unit are 
20 found in the PKD1 pfotein, which. have complete homology with 
the LRR consensus. " 

Surrounding the LRRs are distinctive cysteine-rich 
amino and carbbxy flanking regions (Figures 16b and c). 
This flank-LRR- flank - 'structure is exclusively . found on 
25 proteins In 'extracellular locations and is thought ^to be 
involved in protein-pfoteiri' interactions such as adhesion to 
other cell's or 'to component s± of the extracellular matrix or 
as a receptor concerned vith binding or -signal transduction. 
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,The structure found in, the. PKD1 protein is similar to that 

- found in the Drosophila protein,, slit, which is important 
for .normal central nervous system development (Rothberg, 
1990)., Although slit contains far more LRRs than the PKD1 

5 protein, with four blocks each, consisting, of 4 or . 5 repeat 
units, the structure of each block is similar as they finish 
on the amino and - carboxy side with shortened. LRRs. which , are 
immediately flanked by the cysteine rich regions . In the 
PKD1 protein two shortened LRRs surround, one complete repeat 
10:.- unit and- immediately abut the . amino and carboxy flanking 

- regions. 

The amino flanking, region, consists of four invariant 

cysteines and a number of other. highly conserved residues in 

an area of . 30-40 amino acids; comparison of .the PKD1 region 

15 to amino flanking motifs of : other proteins is shown in 

figure 4b. The carboxy flanking region extends over an area 

of between 50-60 residues and consists of an invariant 

* 

proline and four cysteines plus several other highly 
- conserved amino acids.. The similarity of. the PKD1 region to 
20 carboxy flanking, regions from . other proteins is shown in 
figure 4c. 

./Some LRR proteins, such as slit ( Rothberg 1990) and 
small proteoglycans are wholly . extracellular but others 
.including Toll (Hashimoto et al, 1990) and trkc ( Lamballe 
25 1991) have a. single, transmembrane sequence, while the LH.-CRG " 
receptor and related proteins have seven trans-membrane 
segments .and are involved in signal transduction. 
C- .type lectin domain 
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Analysis of the sequence from exbns 6 and 7 "showed a 
hi£h level of homology with at 0 type lectin domain, : ■ C-type 
' iectins are fourid in a variety of proteins -in extracellular 
locations where they bind * specif ic carbohydrates in the 
5 presence of Ca 2 + ion (Drickamer 1987,' 1988; Weiss 1992). 
Figure 17 illustrates 'the similarity of " the PKD1 lectin 
doiiiain to those found " in" a number of * proteins including: 
proteogylcans , wtiichr interact' with collagens arid other 
components of the extraceillular matrix-; endocytic receptors, 
10 and- selectins which are involved in cell" adhesion and 
recognition. Three different selectins have . been 

identified: E-selectin -(endothelium),* P-selectin 

(platelets) and L-sel'ebt : in ( lymphocytes ) and these work with 
other cell a&heision molecules' to promote binding of the cell 
15 carrying the select in to' various other target cells: 
Immunoglobulin- like - repeat ~ motif v - .". 

. Significant homologies were detected between a region 
* ."of ' exon * 5 and. three regions of exon 15, with the same 
conserved sequence, WDFGDGS , which 'is also- found -in a 
20- melanocyte-specific secreted glycoprotein, Pmell7 (Kwon-et 
al, 1991) and three prokaryotic collagenases or proteinases 
' (Ohaira et al, 1989, Takeuchi et alV 1992 and Matsushita et 
al,-' 1994). Further analysis -of the amino acid sequence' of 
.the PKD1 protein showed that a conserved region* of 
25 approximately 85 bp. could be discerned around this central 
sequence and that * 16. copies of this " repeat were present in 
the PKD1 protein; 1 in exon 5 and the other 15 as a tandem 
array in exons 11 to 15. Figure 18 * shows that a .highly 
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conserved structure is- maintained between, the repeats 
although in some cases less similarity is noted with the 
WDFGDGS sequencer Further analysis of the most conserved 
residues found in the repeat units showed similarity, to 
5 various, immunoglobulin ( Ig ) domains; two Ig repeats which 
show particular, -homology, to the PKD1 protein are shown 
(figure 18). The repeat unit is most similar to that found 
in a number of cell-, adhesion and surface receptors which 
have recently been defined; as . the I set of Ig domains 

10 (Harpaz 1994). Ig repeats consist of 7-9 p strands of 5-10 
residues linked by turns which are packed into two p sheets. 
The B, C, F and G'p-strands of the I set. are particularly 
similar to the PKD1 repeat /-although -the highly conserved 
cystine residues which stabilisevthe two p sheets. through a 

15 disulphide bond- are absent . ~< The ..*D and E p*. strands, however, 
seeirv less similar- "and in -some « cases : are significantly 
'shortened or apparently absent. Further evidence that this 
PKD1 repeat has an Ig-like- structure is found by 'analysis of 
the secondary structure i with the predominant configuration 

20 found of p strands linked- by turns. The WDFGDS area of the 
Ig molecule is one that often has a specific binding 
function- ( Jones et all, 1995) and this sequence may have a 
specific binding role in pblycystah. - 
Type III fibronectin-related domains 

25 Analysis of the secondary structure of the PKD1 protein 

~beyond the carboxy end of the • region of Ig-like repeats 
- showed a. continuatioivof the p ; starid and turn structure. No 
evidence 'of further Ig-like -repeats could be found in this 
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area but three pairs of evenly spaced- -.< 38- 40aa ). tryptophan 
-and ' tyrosine residues ' was noted which are*, the most highly 
7 conserved- positions of the type III fibronectin repeat which 
has a similar secondary structure to Ig domains. .Further 
5 analysis and comparison with other type III fibronectin 
domains showed that in total four -fibronectin. repeats (one 
with leucine replacing the conserved- tyrosine) could be 
•recognised " in this area with many of the most highly 
conserved residues -of this domain' found in the PKD1 repeat 
10 ( Figure 20 ) . - ... 

A large 'number of .proteins with Ig T like repeats have 
.now been- described which -are -involved in cell-cell 
"interactions and- cell - adhesion ( reviewed in Brummendork and 
Rathjen, 1994 ), , while type.. Ill fibronectin (FNIII) domains 
15 are found on extracellular matrix .-molecules apd adhesion 
proteins; A number ,.of cell adhesion- proteins, which are 
- located mainly on neural cells have both Ig-like and FNIII- 
* : related ^domains;- In -these cases the FNIII repeats are 
: : always . positioned. C-terminal .of - the ; Ig-like units and close 
20 - .to a transmembrane- domain;, a similar . pattern is seen in the 
proposed structure of polycystic. --.'These Ig/FNIII containing 
proteins such as neuroglican and NrCAM .are thought to be 
involved in neuron-neuron interactions and .the patterning -of 
the axonal network. * 
25 . Many cell -adhesion proteins of the Ig superfamily are 

- also involved in communication and signal transduction 
mediated through their cytoplasmic .tails . . These cytoplasmic 
-regions are known to bind to cytoskeletal . proteins^ and other 
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• intracellular .components, ..and. phosphorylation of this part 
-"of the molecule is also -thought to . affect -adhesive 
properties of the protein;- potential phosphorylation sites 
•are found in *he cytoplasmic tail; and one intracellular loop 
5 of • polycystin (Figure 20).' 

Transmembrane regions : 

Analysis of hydrophobicity predicted that .the deduced 
protein is an integral membrane protein . with a signal 
peotide and multiple transmembrane (TM) domains located in 
10 the ' C- terminal "region. From this analysis 11 regions 
(including the signal peptide) had a mean hydrophobicity 
indice higher than 1.4 and therefore were considered as 
certain membrane ' spanning- domains' (see Experimental 
* Procedures for details). .Three others with a mean 
15 hydrophobicity indice between ; 0.75- 1.0 were- considered as 
* ; putative TM domains. The mdst likely ^topolo'gy* of ■ ■; the 
protein was predicted using TopPed * II programme'' ( see 
Experimental Procedures for details) and the resulting model 
included one putative segment plus the 10 -certain 
20 transmembrane domains and the ^signal peptide. According to 
this model the N- terminal ' end is extracellular and - the 
( highly hydrophobic) ca'rboxy- terminal region is anchored to 
the membrane by 11 membrane -spanning segments, with the 
highly charged carboxy end located 1 in the cytoplasm. This 
25 topology -is supported by the study of N-glycosylat'ion sites 
" with- all but one site, out of a total of -61" predicted, in an 
extracellular location according to the* model, including 11 
in the two large extracellular loops between ; TM regions . 
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However, if degree of Bydrophobicity required to define 
a certain - putative transmembrane region is .altered within 
the model, the predicted number of such domains can change 
to 9 '(excluding the most N--terminal - pair ) or ,13- (with two 
5 new domains defined between TM7 and. .TM8 ) . .This can be 
ascertained by studies with specific antibodies. 

Most transmembrane proteins containing the .-..types of 
cell -adhesion domain found on polycystin have a single 
transmembrane domain. The role of the multiple membrane 
10 ~ spanning domains-; found , in polycystin is not yet clear. 
Proposed -structure of the PKD1 protein. 

From, the detailed, analysis t of ,the predicted PKD1 
1 protein sequence a model, of- the,, likely structure of the 
- protein can be formulated./ ( Figure ;20) . This model predicts 
15 * an extracellular N- terminal - region of approximately 2550 aa 
-■ containing several, distinctive extracellular domains and an 
intracellular C- terminus of approximately 225 aa. The 
„ intervening region of nearly .1500 aa is associated with the 
membrane; with 11-.. transmembrane .regions predicted and 10 
20 variously sized extracellular and cytoplasmic loops ( see 
Figure 20) . A proline rich hinge is found between . the 
flank -LRR- flank region and -the first Ig-like repeat. Two 
. phosphorylation sites for tyrosine kinase and protein kinase 
. C are fpund in cytoplasmic locations .( Figures 15 and 20)-. 
25 Therefore, the PKpl t protein : , -named^ polycystin, has 

highlighted several clear domains, plus a .reiterated motif 
that occupies over 30% of the protein. 

Characterisation of the PKD1 gene has . proven to be a 



SUBSTITUTE'SHEET(RULE26) 




WO 95/34649 PCT/GB95/01386 

61 

- uniquely difficult problem because most of the gene lies in 
a region which is reiterated elsewhere.. on the chromosome. 
.. The high degree - of similarity between the two .areas ()97%) 
iboth in exons . and introns- has meant that a novel approach 
5 has - been required., to clone the full length transcript; 
involving extensive genomic sequencing and generating cDNAs 
fronua cell line, with the PKD1 but not the HG.lpci. In this 
way a contig containing - the entire PKD1 transcript; has. now 
been cloned. 

10 * Preliminary analysis shows, that the HG genes are very 

similar to PKD1 both- in terms- of genomic structure and 
T - sequence over' most of their length ( apart from the- novel 3* 
regions). The 5 ? * end of the PKD1 ..gene is at a CpG island 
which lies within the duplicated area.- Homologous areas to 
15 this island, in the HG region- also have cleavable; sites for 
- methylati'on sensitive enzymes; .these duplicate, islands 
probably lie at the 5' ends of the various HG genes. 
; Analysis for DNAase hypersensitivity also indicates that the 
HG, ■ CpG islands probably contain active promoters. These 
20 results are consistent with the observation of 
polyadenylated mRNA from^ the* HG genes on Northern blots and 
the similarity of the expression pattern of the HG and.PKDl 
genes in different tissue specific '-cell lines. The HG genes 
may -have complete open - reading, frames and may encode 
25 ' functional- proteins.- Antibodies to their ' unique 1 3' 
regions will be required' to determine this. Although the PKD1 
transcript is large, the overall" size of the gene, at 52 kb, 
• is hot { the • Duche'nne muscular- dystrophy ( DMD ) gene which 
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*' encodes a slightly smaller -transcript has a genomic ysize of 
over "2Mb)."- Indeed, - c if the * fiirst intron of -PKD1 is -excluded, 
from the 'analysis, 40.3% of--- the ' remainder . of ?- the gene is 
' found in the mature mRNA In the compact structure of the 
5 "PKDl' gferie, some of the in-tfons are close- to or. smaller than 
the minimal size of ' -80 bp thought -to be -.required -for 
efficient splicing, although- they are presumably .excised 
effectively.' We have shown that deletion of. 18 or 10 bp 
from one small intron (intron 43), resulting in. an intron of 
10 ' 55 or 57 ' bp,- leads " to aberrant, splicing ( Peral, / 1995 ) . 
Similar mutations may be found in the other small introns of 
this gene. The compact nature of the PKD1- gene probably 
-reflects the GC rich area" of -.the genome in which it- is -found 
(the PKD1 transcript: has a: .total \GC content of about^65%); 
15 'a 'similar 'organisation vis'' seen.- in other, genes fronujthe area 
; :'of r chromosome 16 /(.Vyas, -1992 ) as . in an.; AT rich- genomic 
. . - region. *. j 1 ' ; . . ." 

It.is clear that polycystin has many features of- a cell 
adhesion or recognition molecule, wijth multiple different 
20 extracellular domains. .These .various binding domains are 
likely, to have different specificities \so that, it can be 
'..envisaged that it will bind to a. variety „ of different 
proteins ( and -..carbohydrates ) both on other cells - and 
possibly in the .extracellular matrix. Although' provisional 
25 evidence indicates a wide grange of expression of polycystin 
* in tissue- specif ic cell.rlines. detailed analysis by in situ 
of the mRNA and -with antibodies .to -determine . the cells 
expressing this - protein both in - adult- . .tissue - and during 



SUBSTITUTE SHEET (RULET26) 




WO 95/34649 PCT/GB95/01386 

63 

development will provide further evidence. 

'* Initial analysis has revealed little clear evidence of 
alternate splicing; although one cDNA (out of 6 studied) had 
an extra exon of .255 bp positioned- in intron 16. This exon 
5 contains an. in- frame stop codon and it is not known at this 
stage if this represents an incompletely ^spliced mRNA or a 
splice form of pol'ycystin which terminates at this point. 

- .-Truncation " of the .protein'- here would .leave a secreted 

protein lacking , all. of the • transmembrane and cytoplasmic 
10 regions. Interestingly,- a similar secreted form of the 
-neural adhesion protein, NCAM, which is normally. attached to 
the cell membrane, is produced by' alternate splicing by 
insertion of an exon containing a : stop codon (Gdwer et al., 
1988). »' 

15 The initial changes that have been noted* - in " ADPKD 

kidneys axe abnormal -thickening and ' splitting :of the 
basement membrane (BM) 'and simultaneous de-differentiation 

- of associated epithelial - cells at ' the point of tubular 
dilation. Similar results have been noted in' the 

20 heterozygote Han : SPRD rat (Schafer et al.*, 1994)" which is a 
dominant model of PKD, although it is not known if it is a 
rat model of PKD1 . " 'Concurrent changes in cellular 
characteristics and the BM suggests that a disruption or 
alteration of communication between the 1 cell and the BM may 

25 be the primary change in this' disease. ' -Pblycystin could 
play an " important ■ role in interiaction and communication 
between epithelial -cells and the -BM. It is known that 
signals are required from cells to* the extracellular matrix 



SUBSTITUTE SHEET (RULE 26) 



WO 95/34649 PCT/GB95/01386 

64 

(ECM) for normal BM devielopment and .also that- communication 
'from - the ECM to cells is -required for control of- cellular 
"differentiation. Communication- between the ECM ■ and cells 
occurs by : several different-: means including v through 
5 integrin's ahd so polycystin may bind to ■ integrins although 
it may interact / directly with components., of the. ECM. 
Although 1 ADPKD ■ is generally "a ndisease of adulthood, there is 
plenty of evidence "that the cystic . changes in. the kidney may 
start much earlier ( Milutinovic- et al., 1970), .even in.utero 
10 (Reeders, 1986).. . Expression -of polycystin during renal 
: development may- be when ; its major role occurs, perhaps : in 
" assembly; of i:he BM' -and-,, it, i§ then that, the errors, which 
later lead to cyst development-, . occur . . , -( 

The plethora of connective tissue abnormalities 
15 ..associated . w.i^h v .ADPKp .indicate - that;- the 
adhesion/communication roles of polycystin may be important 
; .for assembly and/or maintenance, of the BM in many tissues, 
as. well* as the kidney. . Hence,- it is .possible that 
disruption- of .normal cell . adhesion . and communication 
20 . mediated- by polycystin may explain the primary defects seen 
- in* the kidney and other organs in ADPKD . Clearly molecules 
, tthat interact with; -..polycystin or have -a- similar role are 
candidates for the, other renal polycystic diseases of man. 
A -study of -.the mutations of - the PKD1. gene highlight 
25 important* functional- regions of the protein. -All of the 
mutations- described so .far in typical t PKDL families involve 
deletion or other disruption in -the • 3 1 end of gene. -Two 
large deletions - detected on i Southern blots remove a_ t large 
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- part, of the protein (or make an out of frame product) 
including the last 6 transmembrane., domains and the C- 
terminal end. The in-frame splicing change described in the 
same -paper would remove most of TM10 and part of the 
5 /preceding cytoplasmic loop.. Two recently described splicing 
mutations (Peral, 19.95), create three different products 
which either delete part. of r the cytoplasmic loop between TM7 
and TM8 or a larger region of this loop including part of 
.TM7 or insert an extra region into that loop. These mutated 

10 genes may make functional protein (they all produce abnormal 
mRNA) • and it is interesting to note that, in each case, 
these, proteins would have an intact extracellular region 
. with disrupted cytoplasmic and transmembrane areas,. Such 
proteins may bind to extracellular targets but are unable to 

15 communicate in a .normal way. ■= ; 

A group of mutations . of PKD1 which completely . delete 
the gene and hence are clearly inactivating : have been 
* described (Brook-Carter, 1994). However, in each of these 
cases the deletions also disrupt the adjacent' TSC2 gene 

20 making interpretation of these cases difficult (TSC2 
mutations alone can cause the development of renal cysts ) . 
Nevertheless,^ the severity- of the polycystic disease ..in 
these patients indicate that inactivation of one PKD1 allele 
does promote cyst development.- Further more, ■ all these 

25 children are often severely affected * at birth, cyst 
formation must occur, in . utero : in these cases and hence 
polycystin has an important developmental role. l A second 
somatic 'hit in ;the target -tissue may also be required in 
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these cases ( and* normal '■ PKDI patients ) " before * - cyst 
J development can occur: ■- " - 
PKD1 GENE AND POLYCYSTIC KIDNEY DISEASE " 

We' have * therefore compel-ling evidence that" mutations of 
5 the PKDI' gene • give rise tb'-the typical -phenotype of ADPKD. 
The* location of this gene within. the PKDI candidate region 
and the available genetic evidence from the families with 
mutations show that this is' the PKDI- gene. The present 
invention therefore ' includes the complete PKDI'- gene itself 
10 and the six PKDI associated mutations which have- been 
described:, a de :'novo translocation, which- was subsequently 
■ transmitted with -the phenotype; * two intragenic deletions 
(one a de novo event ); :two further deletions; and: a splicing 
defect. - " ' > ' - •• : ; > 

15 It has been argued that PKDI could be recessive ;at the 

cellular level, with • a "second" somatic mutation . required to 
give rise to cystic epithelium (Reeders, 1992). This "two 
- hit" process is thought to ' be the mutational mechanism 
giving rise to several- dominant diseases, such as 
20 " neurofibromatosis ( Legius., , et al . , ■ 1993) and, tuberous 
: sclerosis (Green, et al., 1994) which result from a defect 
t in- the control of cellular growth. If this were the case, 
- however, we might expect that a proportion of .constitutional 
PKDI .mutations would- be inactivating deletions /as seen in 
25 these other disorders. ** - - 

i The location of the. -PKDI mutations- - may, however, 

reflect some ascertainment- bias as it is this single copy 
area which has .been screened, most intensively for mutations. 
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Nevertheless, no additional deletions were detected when a 
large part of the gene was screened^ by FIGE, and studies by 
PFGE- showed no. large deletions of this area . in 75.PKD1 
patients. It is possible that the mutations detected so far 
5 result in the production of an. abnormal protein which causes 
disease through a gain of function. , . However, it is also 
- possible that., these mutations eliminate the production of 
functional protein from , this .chromosome and result in the 
PKD1 phenotype by haploinsuf f iciency , or. only after loss of 
10 . the. second PKD1 homologue by somatic mutation. 

: At least one- mutation which seems to delete the entire. 
PKD1 gene^ has been identified (WS-53 ) . but in this case it 
also disrupts the* adjacent. TSC2 gene and the resulting 
phenotype is of TSC- with, severe cystic kidney disease. 
15 Renal cysts are common . in TSC so,, that the. phenotypic 
.significance of deletion of; .the. .PKD1 gene in this ; case is 
.'difficult to assess. It is 1 clear that not- all cases -of 
■renal cystic disease in • TSC are duetto disruption of the 
•PKD1 gene; chromosome 9 linked TSC (TSC1) families also 
20 manifest cystic kidneys and we - have analysed many TSC2 
patients with kidney cysts who do not have deletion of the 
PKD1 gene. : 

Preliminary analysis of the PKD1 -protein sequence has 
highlighted two regions which provide some clues to- the 
25 possible function of the PKD1- gene. At the extreme 5* end 
■. of the characterised region, are two leucine-rich repeats 
( LRRs ) (amino- acids 29-74) .flanked by characteristic amino 
'.flanking (amino acids 6-28) and carboxy flanking sequences 
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(amino acids 76-133) (Rothberg ef al.\ J990). LRRs are 
thought to be involved' in protein-protein ' interations ( Kobe 
'and Deisenhofer, 1994) and the flanking sequences are only 
found in extracellular proteins.' Other proteins with LRRs 
5 flanked on the amino and carboxy sides are receptors or are 
: involved in adhesion or dellular signalling. Further 3* on 
the protein (amino acids 350^515) is a C-type lectin domain 
(Curtis et al., 1992). : This indicates that this region 

- binds carbohydrates and is also likely to be extracellular. 
10 These two regions of homology indicate that the 5' part of 

the PKD1 protein is extracellular and involved in protein- 
protein "interactions. - It is pbssible that this protein is 
a constituent of, or plays 'a role in assembling, -* the 
extracellular matrix (ECM) -and may act as an .adhesive 
15 protein: in' the" " ECM. : *It 'is ' also possible that the 
* extracellular portion of this protein is important in 
signalling to other cells. The function of much of the PKD1 
■ protein >; is still not fully known but the presence of several 
- hydrophobic regions indicates that the protein may " be 
20 threaded through the cell membrane'. : 

. Familial studies indicate that de novo mutations 
probably account for only a small minority of all ADPKD 
cases; a recent study detected 5 possible new mutations in 
209 families (Davies, : et al., 1991). However in our study 
25 " one of three intragenic muttions detected" was a new mutation 
• and the PKD1 associated : translocation was also a de novo 
event. Furthermore,, the mutations detected in the two 

- familial . cases do not account for a : significant proportion 
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of the local PKD1 . . The.- OX875 deletion was only detected in 
1 of 282 unrelated cases, and, the splicing defect was seen 
in only 1* of 48 unrelated cases . Nevertheless, studies of 
linkage disequilibrium .have found evidence of common 
5 haplotypes associated with PKD1 in a .proportion of some 
populations (Peral, , et al. ,1994; Snarey, et al. , 1994) 
suggesting that common .mutations will be identified. 

Once ; a larger range of mutations have been 
characterised it will be possible to, evaluate whether the 

10 type and location of mutation determines disease severity, 
■and if there is a correlation between mutation and extra- 
renal manifestations. • .Previous studies have provided some 
evidence that the risk of cerebral, aneurysms 'runs true', in 
families (Huston, et al., 1993) and that some PKD1 families 

15 exhibit a consistently mild pheno type (Ryynanen, et al. , 
1987). A recent study has concluded that there is evidence 
of anticipation in ADPKD families, especially if .the disease 
is transmitted through the „ mother.- (Fink,et al.., 1994). 
Furthermore, analysis of families with early manifestations 

20 of ADPKD show that there is a -significant intra- familial 
recurrence risk and that childhood cases are most often 
transmitted maternally 4 (Rink, et al. , 1993; Zerres, et al., 
1993)., This* pattern of inheritance is reminiscent of that 
■seen in diseases in which an expanded trinucleotide repeat 

25- was found to.be the mutational mechanism (reviewed in - 
.Mandel., 1993)., However, no evidence for an expanding repeat 
correlating with PKD1 has been found in this region although 
such a. sequence cannot be excluded. 
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There " is ample evidence that early presymptomatic 
diagnosis'©!" PKD1 is helpful because' it- allows 1 complications 
such as hypertension and urinary tract infections to-be 
monitored and treated " quickly 1 '( Ravine, et al . ' 19 91) . - The 
5 identification of mutations within a family allow rapid 
screening of that 'and other families with the same mutation. 
However, 'genetic linkage ' analysis is likely to remain 
important for presymptomatic diagnosis. The accuracy and 
ease of linkage based diagnosis will be improved ' by the 
10 identification of the PKD1 gene as a mierosatellite lies in 
the 3' untranslated region ' of this gene (KG-8) and several 
CA repeats are located 5' of 'the gene (see Figure la and 6; 
Peraf, et al . , 1994; Snarey, et al . ,1994 ) . - -« - . 
Experimental Procedures - ■ - 
15 Clinical Details of Patients 7 "-' 

; T'amilv 77 * ' " " "* .:**r * : *' . • ' 

"* : ■'•*77-2 and 77-3 ? are 48 and' 17" years old /-respectively and 
have typical ADPKD. Both have bilateral polycystic kidneys 
and 77-2' has impaired renal function. Neither patient 
20 manifests any s'igns of" TSC ( apart • from cystic kidneys V on 
clinical and ophthaimological 
examination" or by CT scan of* the brain. 

77-4 is 13 years old, severely mentally retarded* and 
has multiple signs "of " tSC - including adenoma .sebaceum, 
25 depigmented macules and periventricular calcification -:on CT 
scan. Renail 'ultrasound reveals a* small number of bilateral 
. renal' cyst-s. " * . " ; 

ADPKD patients 
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0X875 developed ESRD from ADPKD, aged 46. . Progressive 
-decline iru renal function had been observed over 17 years; 
ultrasound examinations documented enlarging . polycystic 
kidneys with less extensive hepatic cystic disease. Both 
5 kidneys were removed after renal transplantation and 
pathological examination showed typical ..advanced cystic 
disease. in kidneys weighing 1920g and 340g (normal average 
120g). 

0X114 developed ESRD from ADPKD aged 54: diagnosis was 
10 made by radiological : investigation during an episode of 
abdominal pain aged* 25. A progressive decline in renal 
function and , the .development . of . hypertension was 
subsequently observed. Ultrasonic- examination demonstrated 
. enlarged, kidneys with* typical cystic disease, with less 
15 ' severe, hepatic involvement.;, / s . 

. . ■ ' 0X32 is. a member of a .large kindred affected -by typical 
ADPKD in which several' members .. have', developed ESRD. - The 
. patient himself has -been: observed for - 12 years with 
progressive renal failure and .hypertension following 
20. ultrasonic demonstration of /polycystic kidneys. ■> 

.No signs of TSC were ' observed on_ clinical examination 
of any of the ADPKD patients. 
DNA Electrophoresis and Hybridisation. 

DNA extraction, restriction digests, electrophoresis, 
25 -Southern blotting, hybridisation and washing were performed 
. Dy standard - methods or 'as previously described (Harris, et 
. al;, 1990). 'F1GE was performed with- the Biorad FIGE Mapper 
. . using programme 5 to separate fragments from.25-:50 kb. . High 
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molecular weight DNA for PFGE was isolated in agarose blocks 
* and separated on 'the Biorad CHEF DRII apparatus " L using 
appropriate conditions . * 

Genomic DNA probes and solmatic cell hybrids 
5 Many of the DNA 'probes used in this study have been 

described previously: MS205.2 •"(D16S309; RoyleS e t - al., 
* 1992 ) ; GGG1 ( D16S259 : ; Germino, et 'al,, '1990 ) ; N54 ( D16S139; 
Himmelbauer, et al., 1991); SM6 (D16S665), CW23, CW21, and 
JH1 (Eurbpean Chromosome 16 Tuberous Sclerosis Consortium, 
10 1993)'. Microsatellite probes for haplbtype analysis were 
KG8 and W5.2 : (Snarey, et al.; 1994)SM6; CW3-and-CW2, (Peral, 
et al.*, 1994), 16AC2/5 f (Thompson, et al., 1992); SM7 
(Harris, et al,-, - 1991 ) , VK5AC ( Aksehti jevich, et al.,* 1993). 
— • New probes isolated- during this study were: JH4, ' JH5, 

15 JH6, 11 kb, 6 kb and 6 kb BamH ■ ^''fragments-, respectively,- 
-and JH13 and JH14, 4 kb and:2.8 kb BamH I-EcoR I fragments, 
respectively, all' from . the cosmid JH2A; JH8 and JH10 are 4.5 
■ kb and*2 kb Sac I fragments, respectively and.JH12 a 0.6 Sac 
I-BamH I fragment, all from JH4;.8Sl'and 8S3 are 2.4 kb and 
20 0.6 kb Sac II fragments, respectively, from JH8; CW10 is a 
0.5 kb Not I-Mlu'I fragment of SM25A; JH17 is a 2 kb EcoR I 
fragment of NM17. 

The somatic cell' hybrids .N-0H1 ( Germino; '> et al., 1990), 
-P-MWH2A *. (European Chromosome * - 16 * Tuberous Sclerosis 
25 . Consortium, 1993). and Hyl45.19 (Himmelbauer, r et al . , -1991) 
have previously been .described. Somatic cell hybrids 
. containing the paternally, derived CBP2-10) and maternally 
derived' (-BP2-9 ) chromosomes from 0X114 were produced by-the 
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method of Deisseroth and .Hendrick (1979).. . 
Constructing a cosmid contig „ 

Gosmids were /isolated from chromosome 16 specific and 
total genomic libraries, and a contig. was constructed using 
5 the methods and libraries previously described (European 
Chromosome 16 Tuberous Sclerosis Consortium, 1993). To 
ensure that cosmids were derived from the 16pl3.3 region 
(not the duplicate 16pl3.1 area) initially, probes from the 
single copy area were used to screen libraries (e.g. CW21 

10 and N54). Two cosmids mapped entirely within the area 
duplicated, CW10III and JC10.2B. To establish that these 
were from the PKD1 area, they were restriction mapped and 
hybridised with the probe CW10. The fragment sizes detected 
were compared to results obtained with hybrids containing 

15 only the 16pl3.3. are (Hyl45.19) or only the 16pl3.1 region 
(P-MWH2A). 
FISH 

FISH was performed essentially as previously described 
(Buckle and Rack, 1993). The hybridisation mixture 
- 20 contained 100 ng of biotin-II-dUTP labelled cosmid DNA and 
2.5 mg human Cot-1 DNA (BRL), which was denatured and 
annealled at 37 °C for 15 min prior to hybridisation at 42 °C 
overnight. After stringent washes the site of hybridisation 
was detected with successive layers of fluorescein- 
25 conjugated avidin (5 mg/ml) and biotinylated ani-avidin (5 
mg/ML) Vector Laboratories). Slides were mounted in 
Vectashield (Vector Laboratories) containing 1 mg/ml 
propidium iodide and 1 mg/ml 4', 6-diamidino-2-phenylindole 
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(DAPI), to allow concurrent ' G-banded analysis .under UV 
light. Results were analysed and images captured using a 
Bio-Rad MRC 600 confocal ; laser' 'scanning microscope. 
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cDNA screening and characterisation. 

Foetal brain cDNAs libraries in /. phage (Clonetech and 
Stratagene) were screened,, by standard methods with genomic 
•fragments in the; single copy* area (equivalent to CW23 and 
5 CW21) or with a 0.8 kb Pvu II r Eco RI single copy fragment of 
AH3; Six PBP cDNAs were characterised; AH 4 (1.7 7 kb) and 3A3 
.(2..0 kb) are described in European Chromosome 16 Tuberous 
Sclerosis Consortium, 1993, and four novel cDNAs AH3 (2.2 
: kb), . AH6 (2.0 kb), A1C (2.2 .kb ) and B1E (2.9 kb ) . A 
10 Striatum library .( Stratagene ). was screened with JH4 and a 
HG-C cDNA, 11BHS21 (3.8 KB) . WAS ISOLATED, 21p..9 is a 0.9 kb 
Pvu II-EcoR I subclone of this cDNA. A HG-A or HG-B cDNA, 
- HG-4 (7 kb) was also isolated by- screening .the foetal brain 
library (Stratagene) with JH8 ..., : .HG-4/l . 1 is a 1.1 kb Pvu -II- 
15 EcoR I fragment* from the 3' end of HG-4. 1A1H.6 is a 0.6 kb 
Hind III -EcoR I. subclone of a . TSC2. cDNA, 1A-1 ( L. 7 t kb), 
which was isolated, from the' lClonetech library. Each cDNA 
was subcloned into Bluescri.pt and : sequenced utilising a 
combination of sequential, truncation and liigpnucleotide 
20 primers using DyeDeoxy Terminators (Applied Biosystems ) . and 
an ABI 373A DNA Sequencer (Applied Biosystems )• or by hand 
with ' Sequenase ' T7 DNA polymerase OUSB ) . 
. * RNA Procedures 

Total RNA was isolated from. cell lines and tissues by 
25 the method of Chomczynskiand Sacchi (1987) and enrichment 
for mRNA made using- the PolyAT tract. mRNA Isolation System 
. .( Promega ) . For RNA electrophoresis .0.5% agarose denaturing 
.formaldehyde gels were used .which, were- Northern blotted. 
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hybridised and washed by standard procedures. • The 0.24 - 
9.5 kb~ RNA (Gibco * BRL ) • size standard was " used and 
hybridisation of the probe (1-9B3) to the- 13 kb Utro'phin 
transcri'pt (Love, et al . , : 1989) in total * fibroblast RNA was 
5 ; used as a size marker for the large transcript's. 

RT-PCR" was' performed* with 2.5 nig of total RNA by the 
method of Brown" et al. (1990) with ' random hexamer primers, 
Except that AMV-reverse transcriptase (Life Sciences) was 
employed. To characterise the deletion of the PBP 
10 ■■ transcript in'~0X114 "we used the primers: 

* AH# ' F95 1 TTT^ GAC-AAG CAC ATC TGG' CTC TC 3' , 
AH 3 B75' TAC AGC AGG AGG CTC CGC AG 3 ' 
in a DMSO containing PCR' buff er : (Dode, et al., 1990) with 
0.5 mW MgCl 2 and 36 cycles : bf: 94°C, 1 min; 61°C, 1 min; 
15 72°C, 2 min -plus affinal extension of 10 min/ iThe "3A3 C 
^primers used to-amplify "the DX32 cDNA and DNA werfe: 
3A3 CIS'CGC-CGC TTC ACT AGC TTC GAC 3' 
• 3A3 C25 T ACGCTC CAG: AGG GAG TCC AC 3* 

These were employed in a- PCR buffer and cycle 
20. previously described ( HarrisV et- al., 1991 ).. with .ImM .MgCl 2 
■ and an annealing temperature of 61 °C. 

PCR products . for sequencing were amplified vith Pfu-1 
(Stratagene) and ligated into the Srf-1 site in PCR-Script 
(Stratagene) in the 'presence of Srf-1. . *: 
25- RNAse protection 

Tissues from, normal and end-stage • polycystic kidneys 
were immediately homogenised^ in* guanidinium thiocyanate. 
RNA -was purified on a cesium" chloride gradient and *30 mg 
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- total RNA was assayed by-RNAse protection by the method of 
Melton, et al.," (1984) using a genomic template .generated 
with the 3A3/. C primers, r -. 

Heteroduplex Analysis, - ^ 
5 Heteroduplex analysis was performed essentially as 

described, by Keen et al. (1991). Samples were amplified 

- from genomic DNA with the 3A3,;^ C .primers., heated at 95°C for 
5 minutes and incubated at room temperature for. at least 30 
minutes before loading on • a Hydrolink gel (AT Biochem). 

10 Hydrolink gels were run for 12-18 .hours at 250V and, 
fragments observed after staining with ethidium bromide. 
Extraction and amplification of paraffin-embedded DNA 

DNA from formalin fixed, paraffin wax embedded kidney 
tissue was prepared by the. method, of Wright and Manos 

15 (1990), except that ; after proteinase K digestion overnight 
at 55° C, .the' DNA was extracted with phenol plus chloroform 
before ethanol precipitation. Approximately 50 ng of; DNA 
was used for PCR with 1.5 mM MgCl 2 and 40 cycles of. 94 TG for 
1 min, -50°C for 1 min and 72°C for 40 s, plus a 10/ min 

20 extension at 72°C. 

The oligonucleotide primers designed to amplify across 
the genomic deletion of 0X875 were: 
AHF2 : 5 ' - GGG CAA GGG AGG ATG -ACA AG - 3 f . 

JH14B3 : 5' - GGG TTT ATC AGC AG'C AAG CGG - 3' 

25 which produced a product of about 220 bp in individuals with - 
the .0X875 deletion. - 
3 1 RACE analysis of WS-212 - - 

3' RACE was completed' essentially . as ..described 
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(European Polycystic* Kidney Disease' Consortium (1.99.4)-)- 
Reverse transcription Was performed with 5ug total HNA'with 
0.5pg of the hybrid dT :7 adapter ~" primer - using conditions 
previously described (Fronman et al:\ .(1988) J; ; A' specific 
5 '3'- RACE product was amplified with the primer FS and adapter 
primer in'O.SmM MgCl 2 withthe program: 57°C, 60s;.72°C, 15 
minutes and 30 cycles < of 95 *G, 40s; '57°C, : 60s; V2°C, 60s 
plus 72°C, 10 minutes. The amplif ied - product was cloned 
using the TA cloning system (Invitrogen) and - sequenced .by 
10 ' conventional methods. 

Genomic and cDNA . Probes and sainatic cell hybrids . - ,= - 

'The genomic clones- CW21y • ' JH5,'. . JH6 , JH8 , JHlp. JH12, 
JH13 ' and JH14 and * the v'cDNAs . A1C, AH3 , 3A3 ■ and AH 4 are 
* described herein; ' y Newly -described probes are: SM3 a 2.0 kb 
15 * BamH 1 subclone' of* the cosmid-v/SMll, JH9,- 2.4kb* Sac 1 
" fragment and JH11 ; * *1 .:2kn Sac. 1% - :BamHl fragment, both from 
*' JH4-. ; See Eur.- Polycystic Kidney Disease Consortium-. 1994 
"■''and Eur. Chromosome 16 Tuberous clerosis Consortium 1993.- for 
-all above clones. DFS5 is a 4*.. 2 ■■ kb Not 1-Hind 111 fragment 
20 of CW23 (Eur. Chromosome 16 Tuberous Sclerosis Consortium, 
- 19-93). The" cDNAs; .BPG4, ~BPG6', BPG7C and 13-A were- isolated 
from a fetal brain cDNA library in X phage ( Stratagene ) and 
are 7 kb, 2 * kb, . .4 . 5 . kb and 1.2 kb respectively. 

The somatic cell . hybrids have previously, been 
25 described, "* P-MWH2A (Eur./ Chromosome 16 Tuberous Sclerosis 
Consortium, - 1993 ) and Hyl45.19 (Himmelbauer- et al.,. 1991). 
Exon linking ' „ ■ 

Total cellular RNA from the radiation hybrid Hyl45.19 
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was reverse transcribed using random hexamers (Eur. 
Polycystic Kidney Disease Consortium, 1994). This material 
was used as a template for PCR using the proof reading 
polymerase Pfu-1 with the primer pairs described in Table 2. 
5 The resultant products were cloned into the Srf-1 site of 
pPCRscript ( SK+ ) plasmid • 
Sequencing 

Full length sequence was obtained from the genomic 
clones, HG cDNAs and exon link clones using the progressive 
10 unidirectional deletion technique of- Henikoff, (1984). Both 
strands were then sequenced using DyeDeoxy ^ Terminator Cycle 
, Sequencing and, an Applied Biosystems Sequencer 373.A. Contig 
assembly was done using the programmes ..Assembly line (vs 
1.0.7), SeqEd ( vs 1.03) and ^MaqVector, ( 4 .,1 . 4 ) . 
15 . Primer Extension 

Primer extension^ was performed .on total cellular 
. fibroblast RNA. 25ug of RNA was anne.aled .at 60 C C in the 
..presence of 400mM NaCl to ; O.OlpM of HPLC pure 
oligonucleotide which had been end labeirled to a specific 
20 . activity of 3 x 10" 1 cpm/pM with. } 2 P. Primer extension, was 
then performed in the presence of 50mM Tris pH8.2, lOmM .DTT, 
6mM MgCl ; , 25mg/ml Actinomycin D, 0 . 5mM dNTPs , and 8 units of 
,.AMV. reverse transcriptase. The extension reaction : was 
-continued for 60 min at 42°C. The. extension, products were 
25 compared to a sequencing ladder generated using the same 
, primer on . the .genomic clone SMS . The primers used were: 
: N2765.:.5 1 -GGCGCPGCGGG.CGGCATCGTTAGGGCAGCG-3 1 . 
. N5496 : 5 ' -QGCGGGCGGCATCGTTAGGGCAGCGCGCGC-3 ' 
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N5495 : 5 ' -ACCTGCTGCTGAGCGACGCCCGCTCGGGGC-3 ' - 
Analysis of sequence homology 

The predicted PKD1 protein was analyzed for homologies 
with known proteins in the SwissProt and NBRF database using 
5 the BLAST (Altschul et al., 1990) and FASTA (Pearson et" al., 
1988) algorithms. Layouts were' prepared by hand and using 
the programme Pileup. 
Transmembrane regions 

Potential transmembrane segments were identified by the 

10 method of Sipos and von Heljne (Sipos' et al. , 1993); using 
the GES hydrophobicity scale (Engelmen et al., 1986) and a 
trapezoid sliding window (a full window of 21 residues and 
a core window of 11* residues )*, as recommended. Candidate 
transmembrane domains were selected on the basis of their 

15 average hydrophobicity <H>, and were classified as certain 
(<H> > 1 .0) or putktive (6.6 ;*"<H> : <1). 

The best topology for the protein was predicted on the 
basis of three -different criteria: a) the net' charge 
: difference between the 15 N- terminal and the 15 C- terminal 

20 residues flanking the most N-terminal transmembrane . segment 
(Hartmann et al., 198$);" b) the ' difference in positively 
charged residues 'between the two sides of the membrane in 
loops' smaller than 60 residues, and c) the analysis of the 
overall amino acid composition of ' loops longer * than 60 

25 residues by the compositional distance method (Nakashima et 
ai., 1992). • Using the above criteria the TopPred -i I "program 
(Sipos wt al., 1993)* calculated all the possible topologies 
of the proteins including the certain* transmembrane segments 
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and either included or excluded each of the putative 
segments to- determine the.* most likely structure.. 
PKD1 Protein Purification . 

The PKD1 protein may be . purified according to 
5 conventional protein purification procedures well known in 
the art. Alternatively, - the * protein may be purified from 
cells harboring a. plasmid containing an expressible FKD1 
.gene. For example, - the protein may be expressed in an 
E.coli expression system , and purified as follows. 
10 " Cells are grown in a .10 liter volume in , a Chemap 

" Fermentor (Chemapec," Woodbury,. NY) in, 2% , medium. 
Fermentation, temperature— may be 37'C, 'pH 6.8, and air ; as 
provided at "1 wm. Plasmid selection may- be -provided using 
ampicillin for a plasmid containing an ampicillin resistance 
15 gene. Typical yield (wet weight) is 30 g/1. 

For cell lysis, 50g wet cell weight of . .E.coli 
containing .the recombinant ..PKD1 plasmid may be resuspended 
in a final- volume of 100ml in .50 mM Tris-HCl pH 8.0, 5: mM 
: EDTA, 5mM DTT, 15 mM mercaptoethanol , 0.5% triton X-100, and 
20 5 mM PMSF. 300 mg lysozyme is* added .to the suspension,, and. 
incubated for '30 min at room temperature. The material is 
then 'lyzied using a : BEAD BEATER (R) ( Biospec Products, 
Bartlesville, OK ) containing an equal- volume of 0.1-0.15 urn 
glass beads. The liquid is separated from the beads and the 
25 supernatant removed, the pellet dissolved - in 20 mM Tris-Cl 
pH 8.0. . • • > : . 

" The' protein may be purified- from- the supernatant using 
DEAE chroma tography,;~as is .well known in the art. 
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Preparation of Antibodies . - 

Antibodies" specific for PDK1 protein, or a. fragment 
thereof are prepared as follows.* - A peptide corresponding to 
at least ' 8 amino acid residues'- of the PKD1 sequence" of Fig. 
5 ' 15, are synthesized. Coupling of the -^peptide to * carrier 
protein and immunizations'" -is .performed as • described 
(Dymecki,;'S.M. , j\ Biol. Chem 267 : 4815-4823 / 1992 ) . Rabbit 

* antibodies " against - this ' -peptide are raised and sera are 
titered against peptide • antigen by ELISA. The, sera 

10 — exhibiting the highest, titer (1:27,000) are most useful. 

Techniques for preparing monoclonal .antibodies are well 
known, * and " monoclonal- antibodies -of this invention may be 
' prepared by 1 using ".the: - ^synthetic polypeptides of this 
invention, preferably bound to a carrier, as the. immunogen 
15 as was done by Arnheiter et al,. ; , .. Nature ^..294, 278-280 
■ '(1981).- i ■ I •* .. ..:■) •.?■ 

.Monoclonal antibodies /Taret typically- obtained from 
hybridoma tissue cultures or from ascites fluid obtained 

* from animals into, which the hybridoma tissue was introduced. 
20 Nevertheless, monoclonal* antibodies may be described as 

being "raised to" or "induced, by" the synthetic, polypeptides 
, of. this invention or their.' conjugates with a carrier. 

Antibodies are. utilized along with an "indicating 
■ -group" \: also sometimes „ referred to as . -a "label". The 
25 indicating group or .label is utilized -in; conjunction- with 
the antibody as a means for determining whether an immune 
reaction has taken place,* and in some- Instances f or 
determining the extent of such a reaction.. ; z r. 
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The indicating group may t be a single- atom as - in the 
case of radioactive elements such as. iodine 125 or 131, 
hydrogen 3 or. sulfur ,35, or NMR-active elements such as 
■ fluorine 19. or nitrogen 15. The indicating group . may also 
5 be a molecule such as a fluorescent dye like fluorescein, or 
. an enzyme; such -as horseradish, peroxidase (-HRP ) , .or the 
like. . : _ 

- The terms "indicating group." or "label" are used herein 
to include single atoms .and molecules that are linked to the 
10 antibody .or used separately, and whether those .atoms - or 
molecules are used alone or in conjunction with additional 
reagents. Such indicating groups or labels are themselves 
well-known in immunochemistry and constitute *a part of this 
invention only insofar as they are utilized with otherwise 
15 • novel antibodies, .methods and /or systems. • V. w . ; • 

Detection of PKD1* and Subcellular Localization. 

Another embodiment of this invention relates to. an 
assay for the presence of:;PKDl protein in cells. Here, - an 
above-described antibody is raised, and harvested. .The 
20 antibody or idiotype-containing pblyamide' portion thereof is 
then admixed with candidate tissue and an indicating group. 
The presence of the naturally occurring amino acid sequence 
is ascertained by .the formation of an immune reaction as 
signaled by the indicating group. . Candidate tissues- include 
25 any tissue or cell line or bodily fluid ~to be tested for the 
presence of PKD1 . 

Metabolic - •- labeling ; * immunoprecipitation, and 
immunolocalization' assays are performed * in . cells as 
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"•" described previously (Furth ; . M.EV, et al., Oncogene 1:47-58, 
1987; Laemmli,- U.K-r Nature* 227 : 680-685 , 1970; Yarden,. Y. , 
' ' et al., 3 EMBO J, 6:3341-3351, 1987; Konopka, J.B.rr.et al., 
Mol. Cell: Biol; 5:3116-3123, 19ff5). For ; immunoblot 
5 * : analysis,' total lysates are "prepared (using- Fruth's lysis 
buffer) (Fruth, ; M.E.~, et~-al.. Oncogene; 1 : 47-58, 1987). 
Relative protein concentrations are determined with a 
colbrimetric assay kit ; (Bio-Rad) with -bovine serum albumin 
as the standard. A protein of .- lysate containing 

10 — approximately 0,05 mg of protein is mixed with an. equal 
volume of- 2 x SDS. sample- buffer containing.. 2 
mercaptoethanol, boiled : for 5 min. , fractioned on 10% 
polyacrylamide-SDS gels hOKonopka,.. J.B. , et al. , J -Virol. , 
51:223-232, 1984:): and <y» transferred. to immunobilon 

15 polyvinyldine difluoride '_(Millipore\. Corp*., Bedford, .MA) 
filters. Protein blot's * ;lare . treated with, specific 

■*:. antipeptide antibodies (see below). - Primary binding of the 
PKDl-specif ic antibodies . is detected using anti-IgG second 
antibodies conjugated to horseradish • peroxidase and 

20 subsequent chemiluminescence .development ECL Western 
blotting system ( Amersham International ) . - 

For. metabolic labeling, 10 6 cells are labeled, with 100 
uCi.of 35 S-methionine in 1 ml of Dulbeccp's .modified Eagles 
medium- . minus methionine' (Amersham Corp.). for 16h. 

25 Immunoprecipitation of PKD1 protein from labeled cells with 
antipeptide antiserum is performed as described (Dymecki, 
S.M., et al.r, supra). Portions of lysates containing 10 7 cpm 
of acid-insoluble 35 S-methionine are incubated, .with 1 ug of 
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-the antiserum .in 0.5 ml of reaction mixture. 
* Immunoprecipitation - samples are analyzed by SDS- 
polylarcylamide. gel electrophoresis and autoradiography. 

For immunoloca-lization studies, 1Q 7 CMK cells are 
5 • • resuspended in 1 ml of sonication buffer ( 60mM Tris-HCl, pH 
: -"7.5,-6 mM EDTA, 15 mM -EGTA, 0. 75M sucrose , 0.03% leupeptin 
12mM phenylmethylsulfonyl . fluoride, 30 mM 2- 
* mercaptoethanol ) . ; , Cells are . sonicated 6 times for 10 
seconds each and centrifuged at 25,000 xg for 10 min at 4°C. 
10 : - The pell'et is dissolved in ,1 ml ^of sonication buffer and^ 
^centrifuged at 25,000 x g /for. 10 min at 4°C. . 
: ■ The pellet (nucleus fraction) is resuspended. in 1 ml of 

sonication buffer and added to : an equal volume of 2. x SDS 
sample buffer. The supernatant . obtained above (after the 
15 first sonication) is t . again. centrifuged at 100,000.x ,g for 40 
min- at The supernatant vCcytpsolic fraction ) is removed 

and added to an equal volume of 2 x concentrated SDS sample 
"buffer. The" remaining pellet ( membrane fraction). is washed 
and dissolved in sonication buffer and SDS sample buffer as 
20 described above. Protein samples are analyzed by 

electrophoresis on 10%* polyacrylamide. gels, according to the 
-Laemmli method ( Konopka, J.B. , supra). The proteins are 
transferred, from the .gels .on a 0.45-um polyvinylidine 
difluoride membrane for subsequent. ; immunoblot analysis. 
25* Primary binding of the PKD1 specif ic- antibodies is detected- 
using anti-IgG second antibodies conjugated to horseradish 
peroxidase."- - : - - • 

- For immunohistochemical*. localization of PKD1 protein, 
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' CMK cells or-U3T3 are grown on cover slips to approximately 
50% confluence and are washed with 'PBS (pH 7.4) after 
removing" the medium. ■ The" -cells are prefixed for- 1. min at 
' 37-C in 1% paraformaldehyde containing '0 .075% Triton X-100, 
5 ' rinsed with ' PBS and then ' fixed- for 10 rain with , 4% 
paraformaldehyde : After the fixation step, cells are rinsed 
in PBS, quenched in PBS with o-. 1 and finally rinsed again in 
PBS. ; For antibody staining, the cells are first blocked • 
with a blocking solution "( 3% bovine serum • albumin in PBS) t. 
10' ' J arid incubated for 1 hat 37'C. Thevcells are then. incubated , 
for 1 h at 37° C" with antiserum (1:100 dilution or. with 
preimmune rabbit -serum (1:100-). - After the incubation with 
: theprimary antibody 1 , the'cells are washed in PBS. containing 
" 3% ' bovine' and ser\fln v ^lbumirv-»nd^0.. 1% :T.ween 20 and incubated * 
15 'for 1 h at 37 C in f l'uorescein-congugated donkey anti-rabbit 
~' s "igGs (Jackson Immunorese"arch.- Maine)' diluted 1 : 100 in 
blocking" solution.- " ■■• ' - • •> ■: ■"' - 

.,■ a .- The C ovef slips are washed in PBS (pH 8.0), and_ glycerol 
• is added to each cbverslip before .mounting on glass slides 
20 and sealing with -clear nair -polish. All glass slides .are ; 
examined- with a Zeiss Axiophot -microscope. 

An indicating ; group or label is preferably supplied 
aiong with the antibody and. may be- packaged therewith or 
packaged- separately. Additional -reagents such as hydrogen 
25 peroxide and diaminobenzideine may also be included inr the 
•- system when" ah" indicating- group suchi as HRP. is utilized. 
Such materials are readily available in commerce, as are 
many indicating -groups, and need not be- supplied along with 
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the diagnostic system. In addition, ... some reagents such as 
hydrogen peroxide decompose on standing, or are otherwise 
short-lived like some radioactive elements, and are better 
supplied by the end-user. 
5 Pharmaceutical Compositions of the Invention; Dosage and 
Administration 

Pharmaceutical formulations comprising PKD1 nucleic 
acid or protein, or mutants thereof, can be prepared by 
procedures well known in the art. For example, as 

10 injectables, 'e.g., liquid solutions or suspensions. Solid 
forms for solution in, or suspension in, a liquid prior to 
injection also can be prepared. Optionally, the preparation 
also can be emulsified. The active ingredient can be mixed 
with excipients which are pharmaceutically acceptable and 

15 compatible with the active ingredient. For example, water, 
saline, dextrose, glycerol, ethanol, etc. or combinations 
thereof. Also useful are wetting or emulsifying agents, pH 
buffering agents or adjuvants. PKD1 protein or DNA can be 
administered parenterally, by injection, for example, either 

20 subcutaneously or intramuscularly. Additional formulations 
which are suitable for other modes of administration include 
suppositories and, in some cases, oral formulations. In 
each case, the active protein or the nucleic acid will be 
present in the range of about 0.05% to about 10%, preferably 

25 in the'r ange of about 1-2% by weight. Alternatively, the 
active protein or the nucleic acid will be administered at 
a dosage of about 10mg-2kg/kg body weight, preferably 50mg- 
400mg/kg/body weight. Administration may be daily, weekly, 
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or' in a single dosage, as determined by - the physician. 
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OTHER EMBODIMENTS 

Other - embodiments will be evident to those of skill in 
the art..; It should be understood that, the foregoing 
detailed description is provided for clarity only and is 
5 merely exemplary. The spirit . 3nd .scope of the present 
invention .are not limited thereto, being defined by the 
claims set forth below. 
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.--.*. CLAIMS 

1. An isolated nucleic acid sequence comprising: - 
(a) a PKD1. gene or its complementary strand, 

•(b) a .sequence substantially homologous to a 
5 substantial portion of a molecule defined in (a) above, or 
(c) a fragment of . a . molecule defined in (a) or (b) 
above. ' 

2. . A sequence according to claim 1, wherein the PKD1 gene 
has the nucleic acid* sequence according to Figure 15, 

10 ■ .3. !A sequence according- to. claim 1, wherein the PKD1 gene 
has. the partial nucleic acid sequence according to Figure 7. 

4- A sequence according to claim, 1 , -wherein the PKD1 gene 
has .the partial nucleic acid sequence according .to Figure 

r 10. - : - ' ■ " . - 

15 5. An isolated nucleic acid selected from the group 
. consisting of :- - ,j 

,(a) [0X114] a nucleic acid including a deletion of 446 
base pairs between residues- 1746-2192 as defined in Figure 

•7; . " • . : 

20 (b) [0X32] a nucleic acid including a deletion of 135 

. base pairs between. residues 3696-3831 as defined in Figure 

. 7; . - - .. * •; . .c ; " ■-. ^ . . 

(c) [0X875] a nucleic acid, wherein about 5.5kb flanked 
by the: two.Xbal- sites shown in Figure 3a are deleted and the 
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EcoRl site separating the * CW10 (41kb) and JH1 (18kb) 
fragments is thereby absent; "* 

(d) (WS-53) 'a nucleic ^cid^ including a deletion of 
about lOOkb encompassing' the^'PHD! gene, "wherein the 3' end 

5 of the deletion lies between the JH1 and CW21 fragments and 
the 5 ? end of the deletion* lies "between the SM6 and JH17 
frgments shown in Figure 6; 

(e) (461) a nucleic acid wherein about 18 base pairs 
are deleted in the 75 *base pair intron amplified by the 

10 primer pair 3A3C insert at position 3696 of the- 3* sequence 
as shown in Figure 11; 

( f)* (0X1054)" a -nucleic" "acid wherein about- 20 base 
pairs are deleted in the ^75 base pair intron amplified by 
the primer pair 3A3C insert at position 3696 of the 3' 

15 "'sequence as shown**in Figure 11; ■ j ' - 

- : (g) "-(ws-212) a nucleic "acid including a deletion of 
about 75kb downstream of the PKD1 gene and located between 
fragments SM9 and CW9 distal of the PKD1 gene and the PKD1 
;> 3'UTR proximal to the ' PKD1-' gene as -shown in Figure 12; 

20 (h) (WS-215) a nucleic acid including a deletion of 

"about 16*0kb encompassing the PKD1 gene, wherein the- deletion 
extends 3' of the PKD1 gene to within fragment. CW1 5 and 5 1 
of the PKD1 gene to between fragments CW10 and CW36 as shown 
in "Figure 12; 

25 ( i) • (WS-227*) a nucleic acid including a- deletion , of 

about 50kb encompassing the PKD1 gene, wherein the deletion 
extends 3 ,: of the PKD1 * gene "to -within" fragment CW20 and 5 ! 
of the -PKDl gene to within fragment JH11* as. shown in' Figure 
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(j) - (WS-219. ) a nucleic acid including, a deletion of 
about 27kb encompassing a portion of the PKD1 gene, wherein 
the deletion extends 3" of the PKD1 gene within fragment JH1 
5. and. into the PKD1 gene, to within fragment JH6. as shown in 
Figure 12; . - 

(k) (WS-250) a nucleic acid including a deletion of 
about I60kb encompassing the PKD1 gene, wherein the deletion 
extends 3' of the PKD1 gene to within fragment £W20 and 5' 
10 of the PKD1 gene to within fragment BLu24 as shown in 
■ Figures la and 12; and *..:■.' 

(1) (WS-194) a nucleic acid including a deletion of 
about 65kb encompassing the PKD1 gene, wherein the deltion 
. extends 3' of the PKD1 gene, to within fragment CW20 and 5 1 
15 of the PKD1 gene to within fragment CW10. 

6. An isolated nucleic -acid according to; any preceding 
Claim, wherein the molecule is an RNA transcript comprising 
-a sequence complementary to, the coding region -of the nucleic 
acid sequence according to. Fig. 15 and comprising .a length 
20 of about 14 KB. 

.7- Ah .isolated nucleic, acid according to claim 5 
comprising an RNA transcript. 

8. An * isolated nucleic *, acid - according- to. claim 6 
comprising an RNA transcript. 
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9. A nucleic acid probe comprising 10 nucleotides 
complementary to 10 * consecutive nucleotides of the PKD1 
sequence "according to Figure""l=5 - : . ' 

10. A nucieic acid probe according to claim 9 wherein said 
5 probe is between 15 nucleotides and 14 kb in length. 

11. A nucleic acid -probe according to'' claim 10^ said probe 
being between 100 nucleotides and 5 f kb in length. 

12. A recombinant expression vector comprising the -isolated 
nucleic acid according to claim- -l'.^ 

10 '* 13. * A host ceil 1 ' comprising* t-he^ vector of. claim 12;.- 

14. A recombinant expression vector comprising the isolated 
i- -nucleic acid according "to cTaim 5. n - r 

: - : '15. A recombinant- expression^ vectjor comprising the .isolated 
nucleic acid according to claim. 7. 

15 16. An isolated polypeptide comprising a PKD1 protein 
having the amino acid sequence 1 according, to "Fig. 15. 

17. An isolated polypeptide comprising a PKD1 protein 
f ragment having -the lamino acid* sequence according to Fig. 7. 

18. An isolated polypeptide comprising a PKD1 protein 
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fragment haying , the amino acid . sequence . according to Fig. 

10. •., - ... , . . _ . 



19. An isolated polypeptide comprising, a PKD1 protein 
fragment having an amino acid sequence comprising the amino 
acid sequence according to Fig. 7 and the amino acid residue 
deletions defined by the nucleotide, deletions of claim. 5, 
parts ( a ) , ( b ) and- ( j J - 

20. An ^immunoglobulin molecule, having specificity for PKD1 
protein, said protein comprising the amino acid sequence 
according to any one of Figures 7 , 10 or 15, 

.21. .A method for^ screening .a subject to determine /whether 
said subject is a PKDl-rassocia;ted disorder carrier qr has a 
PKD1 -associated disorder, which method comprises detecting 
the presence or absence of PKD1 nucleic acid in a biological 
15 sample from said .subject,? where in detection of ; a mutant "or 
absent PKD1 nucleic acid is indicative of a PKDl-associated 
disorder ; . - * 

22. A method for screening a subject- to determine whether 
said subject is a PKDl-associated* disorder carrier or has a 
20 PKDl-associated disorder, which method comprises detecting 
the presence or absence of PKD1 polypeptide in a biological 
sample from said subject, wherein detection of a mutant or 
. absent- PKD1 polypeptide is ; indicative of a PKDl-associated 
disorder.- v - . 



5 



10 
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23. "a method according to claim 21, comprising" detecting a 
genomic fragment comprising the PKD1 gene or a portion 
thereof, a genomic fragment comprising a flanking region of 
the : PKD1 1 gene or PKD1 RNA. ' ; 

5 24.'*"* A method according- to claim 23, wherein said detection 
••-comprises hybridizing a PKD1 nucleic acid probe to nucleic 
acid from said biological sample and comparing the results 
thereof with results obtained using a biological sample from 

* a • subject who is not'^a" carrier' : of a PKDl-associated 
10 disorder. - " ' '■ ~ . ' ~ 

25. A method according to claim 25, wherein said detection 
includes applying a - nucleic acid amplification process to 

* said -nucleic acid- t'6^ampiify-a : f ragmierit of -the -PKD1 nucleic 
.'-acid. • - .1 ... r- * ■• 

15 26. i ~A method according to claim - 26 , " wherein said nucleic 
■acid amplification process "comprises amplifying a fragment 
of PKD1 nucleic acid utilizing a set of primers selected 
from the group consisting of:- 

AH3 -F9 : AH3' B7 ^ : < >" r 

20 - ' 3A3 CI- : • 3A3 C2- • • * 

< z AH4 F2 : JH14 B3 . .. -V ■ *- ■ - : 

•27. ' A method according to claim 24 wherein said detection 
step comprises digesting nucleic acid - from said-- biological 
sample to EcoRl fragments arid hybridising with a DNA probe 
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which .hybridises : to the restriction fragment in Figure 3(a) 

or 12. . . . . . : . . . 

28. . A method according, to claim 27, wherein nucleic acid 
from said biological sample is digested with EcoR I and said 

5 DNA probe is-, selected .from the group consisting of the 
probes CW10, JH14, JH5 , JH6 , . JH4 , t JH13 , JH8., JH11 and CW36 
identified in Figures 3a and 12. 

29. A method according to claim .28 which comprises 
digesting said nucleic acid to provide BamH I fragments and 

10 hybridising with a DNA probe which hybridises to the BamH I 
fragment identified (B) in Figure 3(a). , 

30. A method according to claim 30, wherein said DNA probe 
comprises the DNA, probe 1A1H0. 6 identified herein. 

31. - A method of treating ~a - patient afflicted with a . PKD1- 
15 associated disorder, comprising administering :a nucleic acid 

sequence according to. any of claims 1 to 8. . ~, 



32. A. method of treating or preventing a PKDl-associated 
. disorder which method comprises . administering to a patient 
-.in need thereof a PKD1 gene having the sequence according to 
20 Figure 15 so. as to permit expression of PKD1 protein. 

.33. A method of treating or preventing a PKDl-associated 
disorder which method comprises administering to a patient 
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in need thereof a mutated PKD1 gene "isolated from WS212 DNA 
so as to permit expression of PKD1 protein. 

34. "A diagnostic kit for Amplifying J a portion- of the PKD1 
gene/' 'comprising " a pair of- nucleic 1 acid primers 

5 complementary to a portion of the PKDi nucleic acid sequence 
according' to Fig. 15, and" packaging means ' therefore. 

35. A diagnostic kit according to claim, 34, wherein the 
""'nucleic acid primers comprise one or more of the following 

sets: * ■ - 

10 " ' : AH 3 F9 : AH 3 B7 : r ' 1 //U - 

3A3 CI : 3A3 C2; : and ' " ( " ' 

AH4 F2 : JH14 B3 . 

36. A diagnostic kit for carrying*- out a*™ method ' "for 
determining whether said subject is a PKDl-associated 

15 -disorder carrier or a patient having a PKDl-associated 
" disorder , which kit ' includes a nucleic acid probe "capable of 
hybridising to a sequence "according to claim- 1. 

37 1 . * A diagnostic kit-'- for carrying out a -method ■ for 
determining whether • said subject is a PKDl-associated 
20 disorder carrier -or* a -patient" having ~a PKDl-associated 
disorder* which kit includes a nucleic "acid probe capable of 
hybridising to a sequence according to claim 6 and packaging 
means therefore. ■' ~ r " 
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38. A diagnostic kit for carrying out a method for 
determining whether said subject is a PKDl-associated 
disorder carrier or a patient having a PKDl-associated 
disorder/ which kit includes a nucleic acid probe capable of 

5 hybridising to a sequence according to claim 5 and packaging 
means therefore. 

39. A diagnostic kit for detecting PKD1 nucleic acid, 
including the'DNA probe CW10 and packaging means therefore. 

40. A diagnostic . kit , for detecting PKD1 nucleic acid, 
10 including the .DNA probe 1A1H0.6 and packaging means- 

therefore. 
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1 cnCAAOGAGGAGCQCCIGAOGCT 60 

1 LN E E P LT LA G E : E I VAQ G K R S 20 

61 GACCOGOGGAGCCTCCTGTC^ 120 

21 D.PRSLLCYGGAPGPGCHF-SI 40 

121 OOCXy\GGCmCAGa3GGGCCC^ 180 

41 PEAFSGALA N LSDVVQLITL 60 

181 GIGGACTCX^AiaXTITia^ 240 

61 VDSNPFPFGYIS.NYTVSTKV 80 

241 GCCTOGATCGCATICCAGACAC^ 300 

81 ASMAFQTQAGAQ IPIERL AS 100 

301 GAGOC^XCCATCACOG?IG^^ 360 

101 ER AITVKVPNNSDW AARG HR 120 

361 AGCTCOGCC^ 420 

421 GI^CCCTCGACAGCAGCAAC^^ 480 

141 VTLDSSNPAAG'LHL QLNYTL 160 

481 CTCGAOGGCCACTACCTOT 540 

161 LD GH Y LS E EP E P Y L A V Y LH S 180 

541 GAGCCCCGGCCCPJS^^ 600 

181 EPRPNEHNCSASRR IRPESL 200 

601 CAGGGTCCIX^CX^COGGC^ 660 

201 QGADHRPYTFFISPGSRDP.A 220 

661 GGGAGITACCATCTGAACCTC^ 720 

221 G SY.HLNLS S,HFRWSALQVSW 240 

780 
260 

781 GGGCIGClGCOOCT G GAGGAGAQCTOGOOOOGQCftG^ 840 

261 G.L L P L E E T -S P R Q A- V C L T R H L 280 

rioj ^T niJ iGTriajita G 900 

P-SH. VRFVFPE 300 

901 CCGAC&GOGGAIIGrAAACTACATO^ 960 

301 PTADVNY I.VMLTC AVCLVTY 320 

961 ATCGICATGGGCECCATCC^^ 1020 

321 MVMAAILHKLDQ/LDASRGRA 340 

1021 ATCCCITIXnX3IGGGCAGOGGGGOC^ 1080 

341 IPFC.GQRGRFKYE. ILVKTGW 360 

1081 GGCCEGGGCTCAGCTACC^^ 1140 

361 G RG S G T TA.HVG I M" L Y G V D S R 380 

1141 AGOSGCCACOGGCACCIGGAOGGOGACA^ 1200 

381 SGHRHLDGD RAFHRNS.L DIF 400 

1201 CGGPZV2CCh<X!^^ 1260 

401 R I A T P H S L G S V W K>I R.V W H D N 420 
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K G L S P A W T-L-Q H. 'V/ I V R D L Q T A 
OGCAGOGCCITCITCC^^ 

R S A' -F F L; V N " D W " LvS V E T\ E - A- N G G 
CIGCJIGGAGAAGGAGGTCC^^ 

L V E K E V L A A S :D A A L L R F R R L 
CTCGTCGCIGAGCIGCAGCCjIG^ 

L V ' A E L Q R G F F D K H T W L S I W D 

R P P R S R F T R f Q" R A T C . C V L L I 
TCCCTCTIXXrrcGGCGCCAA 

C L F L G A N A V W Y G- A V G D S . ,A Y S 



A<XGGGCATCTGIU3*GGCIGAG 

TGH VSRLSPL SVD TVAV GL V 
TOCAGOCTIGGTTCIUTATCCCX^ 

S S V V V Y P V Y LA I , L F L F R M S R 

■? - * 

AGCAAGGTCGCIGGGAGOOOGAGCoixCACA 

S K V ' A G S P S P T P A G Q Q V L D I D 

AGcrcccia^ciancaGriGci^ 

S C L D S. S V L,- D S S F L T* F : S G L H A 
GAGGQ OlTltaTllj GACAGATCAAG&GIGA ^ 

E A : F V G Q M _K S ,D L- F L D D S K S L V 
TGCIGGCXXroXXMGAGC^ 

C n W "P S G E G T L S ... W R JD L.L S D P S I 
GIGGCTAGCAATCTODGGCA^^ 

V G S - N L R Q L A R G Q A G H G L G P E / 
GAGGAOGGCITCTOOCIGGCCAGC^ 

E D G F S L A S P Y S~ P A K S F S A S D 
GAAGAOCTCATOCAGGAGCTCXZTTCC^ 

E D L I- Q Q V L A -E . G V S S P A P T. Q D 
ACCCACATCGAMOGGACCIXX^^ 

T H M E T D L L S S L S S--T P G" E K T E 
A(X3CTGGaXTCklAGAGGCTCGGGGAG 

T * L A L O. R " L G ■ E -L G P P S P .* G L N W E 
Q P Q A A; R L S R T G L V- E G L .R K R L 



CraXGGCCTCGTCTCCCT 

L P. A W C A S L A H G L S - L L L V A V A 
GTCGCKHXiriX^GGGTCGGIXj^ 

V A, V S G W. V G A S F. P P G V S V A W - L 
CTCIOCAGCAGCrXXAGCTK^^ 

L^S'S. SA. SFLA S_ F L - G W E- • P IT- K V L . 
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Figure 7 cont'd .. 

L E-.A..L Y F S _ L V,A K . R^ L H P D E D D T 
CTQCTAGAGAGCOCX3GCT 

L V E S P A V T P V S A R V P R V R P P 
CACEGCTITCCACTCTrcCT 

H G.F A L F LA K E E A R K V K R L H G 
ATGCTCCPGAGCXITCCT^^ 

M^LRS L LVYMLFLLVTLL AS Y 
GGGGATGCCTCATGOCAIG^ 

GD^ASCHGHAYRLQSAI KQ EU 
H, S^.. R A FLA I T R -S- E E L W P W M A H 

V LLPYVHGNQS SP ELG PPRL 

■<.... <. 

OOSCAGCnTXISGCTGCA^ 

R QVR L QEALYPDPPGPRVHT 
C . S A . A G G F _ S T S_D Y D V G W ESP H 
N . G S .G T\W A.YSAPDLLGAWSWG 



S C. A. 



\OGTGCAGGAGCIGGGOCIGAGOCTGGAGGAG 
„V Y D S G G Y V Q E L G L S L E E 



S.R D R LR FLQLH NWL DN R S R A 



GIGITCCIGGAGCTCA^^ 

VF LELTRY SPAVGLHAAVTL 
OXXTCXSAGITCXXXS^ 

RLEFFAAGRALAALSVRPfA 



CIGaCKXXXCTC^GOaaXSGC^^ 

LRRLSAGLS-L.PLLTS.VCLLL 
TiaaaXTIGCACITOX^^ 

FA VHFAVAEARTWHREGRWR 
VLRLGAWARWLLVALTAATA 

CTQGTAOGCcraaxc^ 

LVRLAQLGAADRQWTRFVRG 
CX^CGOGOTXTIX^CTAGCT^ 

RPRRFTS FDQVAHVS SA AR G 
LAASLLFLLLVK AA QHV RFV 



R Q W S V 



FGKTL CRAL'PELLGV 
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Figure 7 Cont'd 

5101 raGGGCTGAC^^ 5160 

5161 TCIGCTGGCGITATC^ 5220 

5221 CTGGGGGCACAGCIGri^^ - 5280 

5281 TGCXECMGCC^^ 5340 

5341 CTAGCAGGACTAGGCATO 5400 

5401 GGGCIO^axaGGCTGGA 5460 

5461 GOGACIGIGCTCTAia^ 5520 

5521 TCTCTAOCACT^^ 5580 

5581 AOCMGCAGKCM&GItt 5631 
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T L G L VVL GVAYAQLAILLVS 
TOTTCIGIGGACTCXXTC^ 

S r C V D S L • W> S .V, A'. Q A:/'L L V. L C; P G , T 

GGGCICICTACCCICT^^ „ 
GLSTLCPAESWHLSPLLCVG 

CTCTOGGCACIGOGGCTCr^^ 

L W - A -X R L W-G <A L R' fL G A: V I L R W R 
TAOCAOGOTTCCinG^^ 

YHALRGELYRPAWEPQDY EM 

V^^L^F 1 ^ V K E F 

03XAC»AACTXGCTIT^^ 

RHKVRFEGMEPLPSRSSRGS 
AAGGIA0XXC0O3ATG?IG0^^ 

K/ V *S P D V P E'P S A.G..S. D: A S H P, S T 



. TCGKXAGCCAGCia^ATCGGCI^^ 
S S S - Q L D G L -S V S Ij . G R L G. T.. R O E 



P E P'l S R 



kCCCAGTITCACCEACTC 
L Q A V. F E_A X L T Q J D R;L 



AAOCAGGCCACASAGGAOTC^ 

NQATEDVYQLEQQLH S; L Q G R 
AGGAGCAGCnmXSCOaSOOQGfiim 

RSSRAPAGSSRGPSPGLRPA 
CIGOCCAGOCXXXTIGCCXXSGGO^ 

LPSRLARASRGVDLATGPSR 



ACACCTTOGGGCCAAGAAGAAGGTCCACC^ 

T P S G Q E Q G P. P Q Q. H X V. . L L P G G 
GGTO3GOO?IGGAGia^ACnGGA» 

GGPWSRS GHRSVLLSAAVKA 
GAGGGOC^GGCAGAATGGCTGCAaC?^^ 

EG QAEWLHVGSPESRQGHLS 
GIXriCIX^GCTTCAGCACTITA^ 

VCGLQHFKEAV W P T R T Q G P L 



CCCAGCKXXTIGGGAAGGACACAGCACT 
PSSLGKDTAVLDGF 

TITATTTCCraGAC^^ 

GTDOOOCACTOCI3W3GCIGCTO 

OXCTAA^l'm'lACCTCTCCA^^ 

TanXTTCACHMTITATATO^^ 
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(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 1: (Carpare Fig.l) 

C GGC GOC GOC TOC CGC CTC AAC TGC TOG GGC OGC GOG CTG OGG ACG . 46 

Gly Ala Ala Cys Axg Val Asn Cys Ser Gly Arg Gly Leu Arg Thr 
1 5 10 * 15 

CTC GGT OOC GOG CTG OGC ATC COC GOG GAC GOC ACA GOG CTA GAC GTC 94 
Leu Gly Pro Ala Leu Arg lie Pro Ala Asp Ala Thr Ala Leu Asp Val 
20 25 30 

TOC CAC AAC CTC CTC. GGG GOG CTG GAC (JIT GGG CTC CTG GOG AAC CTC 142 
Ser His Asn Leu Leu Arg Ala Leu Asp Val Gly Leu Leu Ala Asn Leu 
35 40 45 

TOG GOG CTG GCA GAG CTG GAT ATA AGC AAC AAC AAG ATT TOT AOG TTA 190 

Ser Ala Leu Ala Glu Leu Asp He Ser Asn Asn Lys He Ser Thr Leu 
50 55 60 

GAA GAA GGA ATA TTT GOT AAT TTA TIT AAT TTA' AGT GAA ATA AAC CTG 238 
Glu Glu Gly He Phe Ala Asn Leu Phe Asn Leu Ser Glu He Asn Leu 
65 70 75 

ACT GGG AAC COG TIT GAG TGT GAC TGT GOC CTG GOG TOG CTG COG OGA 286 
Ser Gly Asn Pro Phe Glu Cys Asp Cys Gly Leu Ala Trp Leu Pro Arg 
80 " 85 90 95 

TOG GOG GAG GAG CAG CAG GTG COG GTC GTC CAG COC GAG GCA GOC AOG 334 
Trp Ala Glu Glu Gin Gin Val Arg Val Val Gin Pro Glu Ala Ala Thr 
100 105 110 

TGT GOT GGG OCT GGC TOC. CTG OCT GGCI CAG OCT CTG CTT GGC ATC GOC 382 
Cys, Ala Gly Pro Gly Ser Leu Ala Gly Gin Pro Leu leu Gly He Pro 
115 120 ,125 

TIG CTG GAC. ACT GGC TOT GGT GAG GAG TAT GIC - GOC TOC CTC OCT GAC 430 
Leu Leu Asp Ser Gly cys Gly Glu Glu Tyr Val Ala Cys Leu Pro Asp 
130 135 140 

AAC AGC TCA GGC ACC CTG GCA GCA GTG TOC TTT TCA GOT GOC CAC GAA 478 
Asn Ser Ser Gly Thr Val Ala Ala Val Ser Phe Ser Ala Ala His Glu 
145 150 155 

GGC CTG CTT CAG OCA GAG GOC TOC AGC GOC TIC TOC TIC TOC ACC GGC 526 
Gly Leu Leu Gin Pro Glu Ala Cys Ser Ala Phe Cys Phe Ser Thr Gly 
160 165 , v 170, . .175 

CAG GGC CTC GCA GOC CTC TOG GAG CAG GGC TOG TOC CTG TCT GGG GOG 574 
Gin Gly Leu Ala Ala Leu Ser Glu Gin Gly Trp Cys Leu Cys Gly Ala 
180 185 190 

GOC CAG OCC TOC ACT GOC TOC TTT GOC TOC CTC TOC CTC TOC TOC GGC 622 
Ala Gin Pro Ser Ser Ala Ser Phe Ala Cys Leu Ser Leu Cys Ser Gly 
195 200 , 205 

00C COG OCA OCT OCT GOC OOC AOC TGT AGG GGC COC AOC CTC CTC CAG 670 
Pro Pro Pro Pro Pro Ala Pro Thr cys Arg Gly Pro Thr Leu Leu Gin 
210 215 . 220 

CAC GTC TIC OCT GOC TOC OCA GGG GOC AOC CTC' GTG GGG COC CAC GGA 718 
His Val Phe Pro Ala Ser Pro Gly Ala Thr Leu Val Gly Pro His Gly 
225 230 235 
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OCT CTG GOC TCT GGC CAG CTA OCA GOC TIC CAC ATC GOT GOC COG CTC 766 
Pro Leu Ala Ser Gly Gin Lai -Ala Ala Phe* His lie Ala Ala Pro Leu 
240 245 250- v ' 1 255 

OCT GTC ACT GOC ACA OGC TOG GAC TTC GGA GAC GGC TOC GOC GAG GIG 814 
Pro Val Thr Ala Thr Arg Trp Asp Phe Gly Asp Gly Ser Ala Glu Val 
260 265 n 270 

GAT GOC GOT GGG COG OCT GOC TOG CAT OGC TAT GIG CTG OCT GGG OGC 862 
Asp Ala Ala Gly Pro Ala Ala Ser His Arg Tyr Val Leu Pro Gly Arg - 
275 280 •„ 285 ' 

TAT CAC GTG AOG GOC GTG CTG GOC CTG GGG GOC GGC TCA GOO CTG CTG 910 
Tyr His Val Thr Ala Val* Leu Ala Leu Gly Ala -Gly S£r Ala Leu Leu - 

290 295 v 1 ' - ' 300 . , 

GGG ACA GAC GTG CAG GTG GAA GOG GCA OCT GOC GOC CTG GAG CTC GIG 958 
Gly Thr Asp Val Gin Val Glu Ala Ala Pro Ala Ala Leu Glu Leu Val 
305 '310 315' • 

TGC OOG TOC TOG GTG CAG ACT GAC GAG AGC CTT GAC CTC AGC ATC CAG . 1006 
Cys Pro Ser Ser Val Gin Ser Asp" Glu Ser Leu Asp Leu Ser lie Gin 
320 - 325 330 335 

AAC OGC GGT GGT TCA GGC CTG GAG GOC GOC TAC AGC ATC GTG GOC .CTG . 1054 
Asn Arg Gly Gly Ser Gly Leu Glu - Ala Ala Tyr Ser lie Val Ala~Leu 

.340 345 350 " 

GGC GAG GAG OOG GOC OGA GOG GIG CAC COG CTC TGC OCC TOG GAC AOG . 1102 
Gly Glu Glu Pro Ala Arg Ala Val His -Pro Leu Cys Pro Ser Asp Thr 

— 355. 360 0 < - 365 ; " ; -Y 

GAG ATC TTC OCT GGC AAC GGG CAC TGC TAC OGC CTG GTG GTG GAG AAG 1150 
Glu lie Phe Pro Gly Asn. Gly His Cys Tyr Arg Leu Val Val Glu Lys* - 

370 375 , 380 ' : 

GOG GOC TOG CTG CAG GOG CAG GAG CAG TCT CAG GOC TOG GOC GGG GOC 1198 
Ala Ala Trp Leu Gin Ala Gin Glu Gin Cys Gin Ala Trp Ala Gly Ala 

385 390 395 5 • ; 

GOC CTG GCA ATG GIG GAC ACT OOC GOC GTG CAG OGC TTC CTG GTC TOC 1246 
Ala Leu Ala Met Val Asp Ser Pro Ala Val Gin Arg Phe Leu Val Ser 
400 / 405 , * 410 415 

OGG GTC AOC AGG AGC CTA GAC GTG TGG ATC GGC TTC TOG ACT GTG CAG 1294 
Arg Val Thr Arg Ser Leu Asp Val Trp lie Gly Phe Ser Thr Val Gin 
„ 420 . 425 ' .430 

GGG GTG GAG GIG GGC OCA GOG COG CAG GGC GAG GOC TTC AGC CTG GAG 1342 
Gly Val Glu Val Gly Pro Ala-Pro Gin Gly Glu Ala Phe -Ser Leu Glu, 
.435* . ■ " .-440 .' . ; . , 445 

AGC TGC CAG AAC TGG CTG OOC GGG GAG OCA CAC OCA GOC ACA GOC GAG 1390 
Ser Cys Gin Asn Trp Leu Pro Gly Glu Pro .His Pro Ala Thr Ala Glu 
450 455 - 460 

CAC TOC GTC OGG CTC GGG OOC AOC GGG TGG TCT AAC AOC GAC CTG TGC 1438 
His Cys Val Arg Leu . Gly Pro Thr Gly Trp Cys Asn Thr Asp Leu Cys , 

465 ' 470 . ? 475 , - , 
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TCA GOG OOG CAC AGO TAG GTC TGC G 
Ser Ala Pro His Ser Tyr Val Cys G. 
480 • 485 

CAG GAT GOC GAG AAC CIC CTC GTC 
Gin Asp Ala Glu Asn Leu Leu Val 
500 

GGA COC CTG A(!X3 OCT CTG GCA CAG 
Gly Pro Leu Thr Pro Leu Ala Gin 
' 515 

GAG COC GTC GAG GTC ATG GTA TIC 
Glu Pro Val Glu Val Met Val Phe 
530 535 



545 550 



560 565 



580 



595 



610 615 



625 ' 630 



640 645 



660 



675 



690 695 



705 - . * . 710. 



GAG 
Glii 


CTG 
Leu 


CAG 
Gin 
490 


COC GGA GGC OCA 
Pro Gly Gly Pro 


GIG 
Val 
495 


1486 


GGA 


GOG 
Ala 
505 


COC 
Pro 


ACT GGG GAC CTG 
Ser Gly Asp Leu 
510 


CAG 
Gin 


1534 


CAG 
Gin 
520 


GAC 
Asp 


GGC 
Gly 


CTC TCA GOC OOG 
Leu Ser Ala Pro 
525 


CAC 
His 


1582 


OOG 
Pro 


GGC 
Gly 


CTC 
Leu 


OGT CTG AGC OCT 
Arg Leu Ser Arg 
' 540 


GAA 
Glu 


1630 


GGG 
Gly 


ACC 
Thr 


CAG 
Gin 


GAG CTC OGG OGG 
Glu Leu Arg Arg 

555 ; 


COC 
Pro 


1678 


OOG 
Arg 


CTC 
Leu 


CTC 
Leu 
570 


AGC ACA GCA GGG 
Ser Thr Ala Gly 


AOC 
Thr 
575 


1726 


AGC 
Ser 


AGG 
Arg 
585 


T0C 
Ser 


COG GAC AAC AGG 
Pro Asp Asn Arg 
590 


ADC 
Thr 


1774 


G9G 
Gly 
600 


GGA OGC 
Gly^ Arg 


TOG TGC OCT GGA 
Trp Cys Pro Gly 
605 


GOC 
Ala 


1822 


TCT 
Ser 


TGC CAC 
Cys His 


O0C CAG GOC TGC 
Pro Gin Ala Cys 
620 


GOC 
Ala 


1870 


CTA 
Leu 


COC 
Pro 


GGG 
Gly 


GOC COC TAT GOG 
Ala Pro Tyr Ala 
635 


CTA 
Leu 


1918 


G0C 
Ala 


GOG 
Ala 


GGG 
Gly 
650 


COC COC GOG CAG 
Pro Pro Ala Gin 


TAC 
Tyr 
655 


1966 


GTC 
Val 

r. 


CTC 
Leu 
665 


ATG 
Met 


CTC OCT OCT GAC 
Leu Pro Gly Asp 
670 


CTC 
Leu 


2014 


OCT 
Pro 
680 


GGC 
Gly 


GOC 
Ala 


CIC CTG CAC TGC 
Leu Leu His Cys 
• 685 


TOG 
Ser 




CAG 
Gin 


GOC - OOG 
Ala Pro 


TAC CTC T0C GOC 
Tyr Leu Ser Ala 
700 


AAC 
Asn 


2110 


CCA 
Pro 


G0C CAG 
Ala. Gin 


CTG GAG GGC ACT 
Leu Glu Gly Thr 
715 


TOG' 
Trp 


2158 
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GOC TGC OCT GOC TCT GOC CTG OGG CIG CTT GCA GOC AOG GAA CAG CTC 2206 
Ala Cys Pro Ala Cys Ala Leu Arg Leu Leu Ala Ala Thr Glu Gin Leu 
720 • . '725 r : - n 730 . 735 

AOC GIG CIG CIG GGC TTG AGG GOC AAC OCT GGA CTG OGG Ate'" OCT GOG '* 2254 
Thr Val Leu Leu- Gly Leu Arg Pro Asn Pro Gly Leu Arg Met Pro Gly 
. 740 745 - 750 

OGC TAT GAG CTC OGG GCA GAG GIG GOC AAT GGC GIG TOC AGG CAC AAC ' 2302 
Arg Tyr Glu Val Arg Ala Glu Val Gly Asn Gly Val Ser Arg His Asn 

755 \- \ .760 \ * 765 • ■ : . - 

CTC TOC TGC AGC TTT GAC GIG GIC TOC OCA GIG GCT GGG CIG OGG GIC 2350 
Leu Ser Cys Ser Fhe Asp Val Val Ser Pro Val Ala Gly Leu Arg Val 
770 < . v 775 780 

ATC TAC OCT GOC 00C OGC GAG GGC OGC CTC TAC GIG. O0C AOC AAC GGC 2398 
He Tyr Pro Ala Pro Arg Asp Gly Arg Leu Tyr Val Pro Thr Asn Gly 
785 790 , 795 

TCA GOC TIG GIG CTC CAG GIG GAC TCT GGT GOC AAC GOC AOG GOC AOG ' 2446 
Ser Ala Leu Val Leu Gin Val Asp Ser Gly Ala Asn Ala Thr Ala Tnr 

800 \ - . 805 810 - .815. fi 

GCT OGC TGG OCT GGG GGC ACT GIC AGO GOC OGC TTT GAG AAT GIC TGC * * 2494 
Ala Arg Trp Pro Gly Gly Ser Val Ser Ala Arg Phe Glu Asn Val Cys 
. , 820, - v . 825 . t 830. 

OCT GOC CTG GTG GOC AOC TTC GTG OOC GGC TGC 000 TGG GAG ACT AAC 2542 
Pro Ala Leu Val Ala Thr Phe Val Pro Gly Cys Pro Trp Glu Thr Asn 

: 835 . -,.840 ^ - 845 . t , t 

GAT AOC CTG TIC TCA GTG GTA GCA CIG COG TGG CTC ACT GAG GGG GAG "2590 
Asp Thr Leu Phe Ser Val Val Ala Leu Pro Trp Leu Ser Glu Gly Glu 

850 , 855 . ■ • 860 - 

CAC GIG GIG GAC GIG GIG GIG GAA AAC AGC GOC AGC OGG GOC AAC CTC 2638 
His Val Val Asp Val Val Val Glu Asn Ser Ala Ser Arg Ala Asn Leu 
865 , 870 - 875 

AGC CIG OGG GIG AOG GOG GAG GAG OCT ATC TCT GGC CTC GGC GOC AOG * 2686 

Ser Leu Arg Val Thr Ala Glu Glu Pro He Cys Gly Leu Arg Ala Thr 

880 885 . . 890 895 - 

OOC AGC 00C GAGi GOC OCT GTA CIG CAG GGA GTC CTA GIG AGG TAC AGC ' 2734 
Pro Ser Pro Glu Ala Arg Val Leu Gin Gly Val Leu Val Arg Tyr Ser 

900 ( j ;t 905 ..... . 910 - 

OOC GIG GIG GAG GOC GGC TOG GAC ATG GIC TTC OGG TGG AOC Alt AAC 2782 
Pro Val Val Glu Ala Gly Ser Asp Met Val Phe Arg Trp Thr He Asn 
.915 • , 920 : 925 

GAC AAG CAG TOC CTG AOC TIC CAG AAC GIG GIC TTC AAT GTC ATT TAT " ' 2830 
Asp Lys Gin Ser Leu Thr Phe Gin Asn Val Val Phe Asn Val lie Tyr 
930 • 935- 940 - 

CAG AGC GOG GOG GIC TIC AAG CTC TCA CTG AOG GOC TOC AAC CAC GTG - 2878 
Gin Ser Ala Ala Val Phe Lys Lai Ser Leu Thr Ala Ser Asn His Val 
945 950 955 
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AGC AAC GTC AOC GIG AAC TAC AAC GTA AOC GTC GAG GGG ATG AAC AGG 2926 
Ser Asn Val Thr Val Asn Tyr Asn Val Thr Val Glu Arg Met Asn tog 
960 965 970 975 

ATG CAG GGT CTG CAG GTC TOC ACA GTC COG GOC GIG CTG TOC COC AAT '2974 
Met Gin Gly Leu Gin Val Ser Thr Val Pro Ala Val Leu Ser Pro Asn 
980 ' 985 ' 990 

GOC ACA CTG GTA CTG AOG GGT GCT GTG CTG GTG GAC TCA GCT GTG GAG 3022 
Ala Thr Leu Val Lai Thr Gly Gly VaT . Leu Val Asp Ser Ala Val Glu 
995 1000 1005 

GTC GOC TIC CTG TGG AAC TTT GGG GAT GGG GAG CAG GOC CTC CAC CAG ;~. 3070 
Val Ala . Phe Leu Trp Asn Fte Gly Asp Gly Glu' Gin Ala Leu His Gin 
1010 1015 1020 

TTC CAGaTTOOGTACAACGAGTOCTICCnSG^ . 3118 

Phe Gin Pro Pro Tyr Asn Glu Ser Phe Pro Val Pro Asp Pro Ser Val 
1025 1030 1035 

GOC CAG GIG CTG GTG GAG CAC AAT GTC ATG CAC ACC TAC GCT GOC CCA 3166 
Ala Gin Val Leu Val Glu His Asn Val Met His Thr Tyr Ala Ala Pro - . 
1040 1045 1050 1055 

GGT GAG TAC CTC CTG AGC GTG CTG GCA TCT. AAT GOC TTC GAG AAC CTG 3214 
Gly Glu Tyr Leu Leu Thr Val Leu Ala Ser Asn Ala Phe Glu Ash Leu 
1060 1065 1070 

AOG CAG CAG CTG OCT GIG AGC GIG CGC GOC TOC CTG CCC TOC GTG GCT 3262 
Thr Gin Gin Val Pro Val Ser Val Arg Ala Ser leu Pro Ser Val- Ala ' 
1075 1080 1085 

CTG GCT GIG ACT GAC GGC GIC CTG GIG GOC GGC CGG CCC GTC AGC TIC 1 . 3310 
Val Gly Val Ser Asp Gly Val Leu Val Ala Gly Arg Pro Val Thr Fte 
1090 1095 1100 

TAC COG CAC OOG CTG COC TOG OCT GGG GCT GIT CTT TAC AOG TGG GAC 3358 
Tyr Pro His Pro Leu Pro Ser Pro Gly Gly Val Leu Tyr Thr Trp Asp V 
1105 1110 1115 

TTC GGG GAC GGC TOC OCT GIC CTG AOC CAG AGC CAG O0G GCT GOC AAC 3406 
Phe Gly Asp. Gly Ser Pro Val Leu Thr Gin Ser Gin Pro Ala Ala Ash 
1120 1125 1130 1135 

CAC AOC TAT GCC TOG AGG GGC. ACC TAC CAC GIG CGC CTG GAG GTC AAC 3454 
His Thr Tyr Ala Ser Arg Gly Thr Tyr His Val Arg Lai Glu Val Asn 
1140 1145 1150 

AAC ACG GIG AGC GCT GOG GOG GOC CAG GOG GAT- GTG CGC GTC TTT GAG 3502 
Asn Thr Val Ser Gly Ala Ala Ala Gin Ala Asp" Val * Arg Val Fte Glu 
1155 1160 1165 

" GAG CTC CGC GGA CTC AGC GTC GAC ATG AGC CTG GOC GIG GAG CAG GGC 3550 
Glu Leu Arg Gly Leu Ser Val Asp Met Ser Leu Ala Val Glu Gin Gly 
1170 1175 1180 

GOC COC GIG GIG GIC AGC GOC GOG GIG CAG AOG GGC GAC AAC ATC AOG - 3598 
Ala Pro Val Val Val Ser Ala Ala Val Gin Thr Gly Asp Asn lie Thr * 
1185 1190 1195 
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TGG ADC TIC GAC ATC GGG GAC GGC ADC CTG CTG TOG GGC DOG GAG GCA . 3646 
'Trp Thr Phe Asp Met Gly Asp Gly Thr Val Leu Ser Gly Pro Glu Ala 
1200 1205 1210 1215* 

. ADA GIG GAG CAT GTG TAC CTG OGG GCA CAG AAC TGC ACA GTG ADC GIG . 3694 
Thr Val Glu His Val Tyr Leu Arg Ala Gin Asn Cys Thr Val Thr Val 
1220 . ; 1225 \ 1230 

GGT GOG GOC AGC COC GOC GGC CAD CTG GOC OGG ADC CTG CAD GTG CTG 3742 
Gly Ala Ala Ser Pro Ala Gly His Leu Ala Arg Ser Ldu His Val Leu 
1235 ; 1240. . 1245 : 

GIC TIC GIC CTG GAG GIG CTG OGC GIT GAA OOC GOC GOC TGC ATC OOC . 3790 
Val Phe Val Leu Glu Val Leu Arg Val Glu Pro' Ala Ala Cys lie Pro ' . . - 
1250 1255 1260 

AOG CAG OCT GAC GOG OGG CTC AOG GOC TAD GIC ADC GGG AAC COG GOC, . . 3838 
Thr Gin Pro Asp Ala Arg Leu Thr Ala Tyr Val Thr Gly Asn Pro Ala 
1265 * 1270 - 1275 ' 

CAD TAD CTC TIC GAC TGG..ADC TIC GGG GAT GGC TCC TOC AAC AOG ADC ^ .3886 
His Tyr Leu Phe Asp Trp Thr Phe Gly Asp Gly Ser Ser Asn, Thr Thr 
1280 1285 ' 1290 1295 ' 

GTG OGG GGG TGC OOG ADG CTG ACA CAD AAC TIC ADG OGG AGC GGC, ADG . 3934 
Val Arg Gly Cys Pro Thr Val Thr His Asn Phe Thr Arg Ser. Gly Thr ' 
1300 ' 1305 ' 1310' 

TIC OOC CTG GOG CTG CTG CTG TOC AGO. OGC GTG AAC AGG GOG CAT TAD. , . 3982 
Fhe Pro Leu Ala Leu Val Leu. Ser Ser Apg Val Asn Arg Ala His Tyr , . 
l - 1315 . J - -I32Q * 1 1325 ^ . 

. TTC ADC AGC, ATC. TGC, GIG. GAG OCA GAG GIG GGC AAC GIC ADC CTG CAG 4030 
Phe Thr Ser lie Cys Val Glu, Pro Glu Val Gly Asn Val Thr Leu Gin 
1330 1335 . " 1340 

r OCA GAG AGG. CAG TIT GIG CAG CIC GGG GAC GAG GOC TGG CTG GIG GCA 4078 
Pro Glu Arg Gin Phe Val Gin Leu' Gly Asp Glu Ala Tip Leu Val Ala . 
1345 v 1350 1355 

TGT GOC TGG. OOC OOG TTC OOC TAD OGC TAD ADC TGG GAC TTT GGC ADC ,4126 
Cys Ala Trp Pro. Pro, Phe Pro Tyr. Arg Tyr Thr Trp Asp Phe Gly. Thr . 
1360 ^1365' * " 1370 . 1375 

GAG GAA GOC GCC OOC ADC OGT GOC AGG GGC OCT GAG . GIG AOG TIC , ATC ... 4174 
Glu Glu Ala Ala Pro Thr. Arg Ala Arg Gly Pro Glu Val Thr Pte Hie 1 . 
1380 * 5 * ;i385 1390 

TAC OGA GAC OCA GGC TOC TAT CIT GIG ACA ( GIC ADC GOG TCC AAC AAC .. 4222 
Tyr Arg Asp Pro Gly Ser Tyr '.Leu Val Thr Val Thr Ala Ser Asn Asn 
1395 . - ' " 1400 V ; ' ' 1405 

ATC TCT GCT GOC AAT GAC TCA GOC CTG GIG GAG. GIG CAG GAG OOC GIG > 4270 
lie Ser Ala Ala Asn* Asp, Ser Ala Leu Val Glu Val Gin Glu Pro Val ' - 
1410 1415 1420 

CTG GIC ADC AGC ATC, AAG GIC AAT GGC TOC CIT GGG CTG GAG CTG CAG 4318 
Leu Val Thr Ser He Lys Val Asn Gly Ser Leu Gly Leu Glu Leu' Glh 
1425 ' 1430 - 1435 
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CAG GOG TAC CTG TIC TCT GCT GTG GGC OCT GQG OGC OOC GOC AGC TAC 4366 
Gin Pro Tyr Leu Phe Ser Ala Val Gly Arg Gly Arg Pro Ala Ser Tyr 
1440 1445 1450 1455 

CTG TOG GAT CTG GQG GAC GCT GQG TOG CTC GAG GCT OOG GAG GIC ADC 4414 
Leu Trp Asp leu Gly Asp Gly Gly Trp Leu Glu Gly Pro Glu Val Thr 
1460 - 1465 1470 

CAC GCT TAC AAC AGC ACA OCT GAC TIC AOC GIT AGG GIG GOC GGC TOG 4462 
His Ala Tyr Asn Ser Thr Gly Asp Phe Thr Val Arg Val Ala Gly Trp 
1475 1480 1485 



AAT GAG GIG AGC CGC AGC GAG GOC TOG CTC AAT GIG AOG GTC AAG OOG . . 4510 
Asn Glu Val Ser Arg Ser Glu Ala Trp Leu Asn Val Thr Val Lys Arg 
1490 . , 1495 1500 

- OGC GIG CGG GQG CTC GTC GTC AAT GCA AGC CGC AOG GTG GIG OOC CTG 4558 
Arg Val Arg Gly Leu Val Val Asn Ala Ser Arg Thr Val Val Pro Leu 
1505 1510 4515 

. AAT GOG AGC CTC AGC TIC AGC AOG TOG CIC . GAG GOC GGC ACT GAT GIG 4606 
Asn Gly Ser Val Ser Phe! Ser Thr Ser. Leu Glu Ala Gly Ser Asp Val 
1520 1525 1530 1535 

OQC TAT TOC TOG GTC CIC TGT GAC,CQC TOC AOG OOC ATC OCT GGG GCT 4654 
Arg Tyr Ser, Trp Val- Leu Cys Asp Arg' Cys Thr Pro lie Pro Gly Gly 
1540 1 . :i5iS 1550 

OCT AOC ATC TCT TAC AGC TIC. OGC TOC GIG. GGC AOC TIC AAT ATC ATC .4702 
Pro Thr lie Ser 1 Tyr Thr Phe Arg Ser Val. Gly Thr Phe Asn lie lie 
1555 - 1560 1565 

GTC AOG GCT GAG AAC GAG GTG GGC TOC GOC CAG GAC AGC ATC TIC GTC _ 4750 
Val Thr Ala Glu Asn Glu Val Gly Set:* Ala Gin Asp Ser lie Phe Val 
1570 ' ' 1575 1580 - * * * 

TAT GTC CTG CAG CIC ATA GAG GQG CTG CAG GIG GTC GGC GCT GGC OGC 4798 
Tyr Val Leu Gin Leu lie Glu Gly Leu Gin Val Val Gly Gly Gly Arg 
1585 ; 1590 1595 . 

TAC TTC OOC AOC AAC CAC AOG GTA CAG CTG CAG GOC GTG GTT AGG GAT 4846 
Tyr Phe Pro Thr Asn His Thr Val Gin Ihu Gin Ala Val Val Arg Asp - 
1600 1605 ' •. » ■ . 1610 - 1615 

GGC AOC AAC GTC TOC TAC AGC TGG ACT GOC TOG AGG GAC AGG GGC OOG 4894 
Gly Thr Asn Val Ser Tyr Ser Trp Thr Ala ^Trp Arg Asp Arg Gly Pro 
1620 1625 - -1630 

GOC CIC GOC GGC AGC GGC AAA GGC TTC TOG CIC AOC GTC CTC GAG GOC 4942 
Ala* Leu Ala Gly Ser. Gly Lys Gly Phe*Ser Leu Thr Val Leu Glu Ala 
1635 . 1640 . 1645 . 

GGC AOC TAC CAT GTG CAG CTG OGG GOC AOC AAC ATC CTG GOC AGC GOC 4990 
Gly. Thr Tyr His Val Gin Leu Arg Ala Thr Asn Met Leu Gly Ser Ala 
1650' ■ - 1655 1660 

TGG GOC GAC TGC AOC ATC GAC TTC GIG GAG OCT GTC GGG TGG CTG ATC 5038 
Trp Ala Asp Cys Thr Met Asp Phe Val Glu Pro Val Gly Trp Leu Met 
1665 " - 1670 - ' 5 1675 ' - 
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GTG ADC GOC TOC OOGAAC CCA OCT GOC GIC AAC ACA AOC GTC AOC CTC . -5086 
Val Thr Ala Ser Pro Asn Pro Ala Ala Val Asn ; Thr Ser Val Thr Leu 
1680 1685 1690 - 1695 - 

ACT GOC GAG CTC GCT GGT GOC ACT GGT CTC GTA TAC ACT TOG TOC TTG 5134 
Ser Ala Glu Leu Ala Gly Gly Ser -Gly Val Val Tyr Thr Trp Ser Leu .. 

1700 . ■ 1705 " ' - 1710 

GAG GAG GGG CTG AGC TGG GAG AOC TOC GAG CCA TIT. AOC ADC CAT AGC . 5182 
Glu Glu Gly Leu Ser Trp Glu Thr Ser Glu Pro Phe Thr Thr His Ser 
1715 • ■ 1720 1725: 

TIC GOC AGA 00C GGC CIG CAC TIG CTC ADC ATG ADG GCA GGG AAC DOG 5230 
Phe Pro Thr Pro Gly Leu His Leu Val Thr Met Thr Ala- Gly Asn Pro 
1730 * * 1735 - 1740 

-CTC GGC TCA GOC AAC GOC A& GIG GAA GIG GAT GTG GAG' GIG OCT GIG - 5278 
Leu Gly Ser Ala Asn Ala Thr Val Glu Val Asp Val' Gin Val Pro Val- 
1745 1750 1755 

ACT GGC CTC AGC ATC AGG GOC AGC GAG OCT GGA^GGC AGC TIC GTG. GOG r .. 5326 
Ser Gly Leu Ser lie Arg Ala Ser Glu Pro Gly Gly Ser! Phe Val Ala x \ 
1760 1765 ' 1770 1775 

GOC GGG TOC TUT GTG OOC TIT TGG GGG CAG, CTG; GOC ADG GGC ACC AAT .5374 
Ala Gly Ser Ser Val Pro Phe Trpdy-Glii'Leu Ala Thr Gly Thr Asn* 
1780 1785 1790 

GTG AGC TGG TOC TGG GCT, CTC. OCC^ GGC GGC AGC AGC AAG OCT GGC OCT , 5422 
Val Ser Trp Cys Trp>Ala VaTPrtfGly Gly, Ser Ser Lys Arg Gly Pro' 
1795 1800 ' 1805 

CAT CTC ADC. ATG GTC TTC COG. GAT GCTGGC ACr'TTC TOC ATC Q33?CTC , 5470 
His Val Thr Met Val Phe Pro Asp Ala Gly Thr Phe Ser lie Arg Leu ■ 
1810 1815 1820 

AAT GOC TOC AAG GCA GTC AGC TGG GTC TCA GOC ADG ' TAG AAC CTC ADG 5518 
Asn Ala Ser Asn Ala Val Ser Trp Val Ser Ala Thr Tyr Asn Lai Thr 
1825 1830 " 1835 

GOG GAG GAG COC ATC GTG GGC CTG GIG CTC TOG GOC AGC AGC AAG GTG : . 5566 
Ala Glu Glu Pro lie Val Gly Leu Val lau Trp Ala Ser Ser Lys Val 
1840 1845 1850 1855 

GIG GOG COC GGG CAG CTG GTC CAT TTT CAG ATC CTG CTG GCT GOC GGC . 5614 
Val Ala Pro Gly Gin Leu Val His Phe Gin He Leu Leu Ala Ala Gly 
1860 1865 1870 

TCA GCT GTC AOC TTC OGC CTG CAG- GTC GGC GGG GOC AAC DOC GAG GTC . j ' 5662 
Ser Ala Val Thr Phe Arg Leu Gin Val Gly Gly Ala Asn Pro Glu Val 
1875 1880 1885 

CTC DOC GGG OOC OCT TTC TOC CAC- AGC TTC OOC OGC- GTC GGA" GAC CAC " 5710 
Leu Pro Gly Pro Arg Phe Ser His Ser Phe Pro- Arg Val Gly Asp His 
1890 1895 1900 

GTG GTG AGC GTG OQG GGC AAA AAC- CAC GTG AGG, TOG GOC CAG GOG CAG ' 5758 
Val Val Ser Val Arg Gly Lys Asn His Val Ser Trp Ala Gin Ala Gin 
1905 1910 1915 
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GIG OGC 'ATC GTC GTG CTG GAG GOC GIG AGT GGG CTG CAG ATG OOC AAC 5806 
Val Arg lie Val Val Leu Glu Ala Val Ser Gly Leu Gin Met Pro Asn 
ig20 1925 . 1930 1935 . 

TGC TGC GAG OCT GGC ATC GOC AOG GGC ACT ; GAG AGG ' AAC TIC- AGA GOC 5854 
CVs Cys Glu Pro Gly He Ala Thr Gly Thr Glu Arg Asn Phe Thr Ala 
1940 1945 1950 

OGC GIG CAG r CQC GGC TCT OGG GTC GOC TAG GOC TGG TAG TTC TOG CIG 5902 
Arg Val Gin Arg Gly Ser Arg Val Ala Tyr Ala Trp Tyr Phe Ser leu 
1955 1960 1965 

GAG AAG GTC CAG GGC GAC TOG CIG GIC ATC CIG TOG GOC OGC GAC GTC 5950 
Gin Lys Val Gin Gly Asp Ser Leu Val He Leu< Ser Gly Arg Asp Val 
1970 1975 1980 

AOC TAG AOG C0C GIG GOC GOG GOG CIG TTG GAG ATC CAG GIG OGC GOC- 5998 
Thr Tyr Thr Pro Val Ala Ala Gly Leu Leu Glu lie Gin Val Arg Ala, 
1985 1990 ; 1995 

TIC AAC GOC- CTG GGC AGT GAG AAC- OGC AOG CTG GIG CTG GAG GIT CAG ^ 6046 
Phe Asn Ala Leu Gly Ser. Glu Asn Arg Thr Leu Val Leu Glu Val Gin 
2000 2005 2010 2015 

GAC GOC GTC CAG TAT GIG GOC CIG CAG ; AGC GGC O0C TGC TTC AOC AAC 6094 
Asp Ala Val Gin Tyr Val Ala Leu Gin Ser f Gly Pro Cys Fhe Thr Asn 
2020 2025 2030 

OX TOG GOG CAG TTT GM : 00C (SGO-AX-i^-dOC'AGC CDC OGG OGT GIG 6142 
Arg Ser Ala Glii Fhe Glu Ala Ala Thr ; Ser~Pro Ser Pro Arg Arg Val ; 
2035 - 2040 2045 

GOC TAC CAC TGG GAC TTT GOG GAT GOG TOG OCA GOG CAG GAC AGA GAT 6190 
Ala Tyr His Trp Asp Phe Gly Asp Gly Ser Pro Gly Gin Asp Thr ' Asp ^ 
2050 2055 2060 

GAG GOC AGG GOC GAG CAC TOC TAC CIG AGG OCT GGG GAC TAC OGC GIG - 6238 
Glu Pro Arg Ala Glu His Ser Tyr Leu Arg Pro Gly Asp Tyr Arg Val 
2065 2070 2075 

CAG GIG AAC GOC TOC AAC CIG GTG AGC TIC TIC GIG GOG CAG GOC AOG 6286 
Gin Val Asn Ala Ser Asn Leu Val Ser Phe Phe Val Ala Gin Ala Thr 
2080 2085 2090 2095 

GTG AOC GTC CAG GIG CIG GOC TGC OGG GAG COG GAG GIG GAC GIG GIC . 6334 
Val Thr Val Gin Val Leu Ala Cys Arg Glu Pro Glu Val Asp Val Val 
2100 J 2105 2110 

CIG GOC CTG CAG GTG CTG ATG OGG OGA TCA CAG OGC AAC TAC TIG GAG 6382 
Leu Pro Leu Gin Val Leu Met Arg Arg Ser Gin Arg Asn Tyr Leu Glu 
2115 2120 2125- 

GOC CAC GIT GAC CIG OGC GAC TGC GIC AOC TAC CAG ACT GAG TAC OGC ; 6430 
Ala His Val Asp Leu Arg Asp Cys Val Thr Tyr Gin Thr Glu Tyr Arg 
2130 2135 2140 

TGG GAG GTG TAT OGC AOC GOC AGC TGC CAG OGG OOG GGG OGC OCA GOG 6478 
Trp Glu Val Tyr Arg Thr .Ala Ser Cys Gin Arg Pro Gly Arg Pro Ala 
2145 2150 2155 
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OCT GIG GOC CTG OOC GGC GTG GAC GTC AGC OGG OCT OGG CTG GIG CTG 6526 
Arg Val Ala Leu Pro Gly Val Asp .Val Ser Arg Pro Arg Leu. Val Leu 
2160 2165 2170 2175 

COG OGG CTG GOG GTG OCT GTG GQG'CAC TAG TGC TTT GTG TTT ; GTC, CTG. , 6574 
Pro Arg Leu Ala Leu Pro Val Gly His Tyr Cys Phe Val Phe Val f Val- 
2180 2185 - 2190 

TCA ITT GGG GAC AOG OCA' CTG ACA CAG AGC ATC CAG GOC AAT GTG AOG , 6622 
Ser Phe Gly Asp Thr Pro Leu Thr, Gin Ser lie Gin Ala Asn Val Thr 
2195.. 2200 2205 

GTG GOC OOC GAG OQC CTG GTG OGC ATC ATT -GAG GCT.GGC TCA TAG OGC 6670 
Val Ala Pro. Glu Arg Leu „ Val Pro lie lie Glu Gly Gly Ser Tyr Arg 
2210 2215 . ' 2220 

CTG TOG TCA GAC. ACA OGG GAC CTG, GTG CTG GAT GGG AGC* GAG TOC TAG" ; 6718 
Val Trp Ser Asp Thr. Arg >Asp Leu Val Leu Asp Gly Ser Glu Ser Tyr 
2225 2230 \ 2235 

GAC OOC AAC CTG GAG GAG - GGC . GAC CAG AOG OGG. CTC ACT TIC CAG TGG 6766 
Asp Pro Asn Leu Glu Asp Gly Asp~Gln Thr Pro Leu Ser Phe His. Trp. , t 
2240 .V 2245 " 2250 2255 

" GOC TGT GTG OCT TOG. ACA CAG AGG;.GAG "GCT GGC..GGG TGTT GOG CTG AAC . 6814 

Ala Cys Val Ala Ser Thr Gin Arg Glu, Ala Gly Gly Cys Ala Leu Asn \ ; 

pcv- 2260 ;vu; 2265 \ 2270 ' 

TTT QQG O^ C^ G^ ,6862 
Phe Gly_Pro Arg Gly - Ser Ser Thr -Val -Thr lie Pro Arg.Glu Arg Leu : 
2275 "* ~ 2280 2285 

/.GOG GCT GGC CTG GAG TACAOC TTC AGC CTG AGC GTG TGG AAGGOC GGC . 6910 
Ala Ala Gly Val Git* Tyr Thr Phe Ser Leu - Thr Val Trp Lys Ala Gly ' 
2290 2295 2300 

OQC AAG GAG GAG GOC AGC AAC CAG AOG CTG CTG ATC OGG ACT GGC OGG 6958 
Arg Lys Glu .Glu Ala Thr Asn Glh Thr Val Leu lie .Arg Ser .Gly Arg 
2305 2310 2315 

GTG OOC ATT GTG TOC ,TTC GAG TCT -GTG TOC TGC AAG GCA CAG GOC GTG . , 7006 
Val Pro lie Val Ser Leu Glu Cys Val Ser Cys Lys Ala Gin "Ala Val 
2320 " * 2325 - <# fc 2330 2335 , 

. TAC GAA GTG AGC OGC AGC TOC TAC GTG TAG ,TTG GAG GGC OGC TGC CTC 7054 
" Tyr Glu Val Ser Arg Ser Ser Tyr Val Tyr Leu Glu Gly Arg Cys Leu 
2340 ' ' - . 2345 -\. 2350 

AAT TGC AGC AGC GGC TOC AAG OGA GGG OGG TGG GCT GCA OCT AOG TTC , ... 7102 
Asn Cys Ser Ser Gly Ser Lys Arg Gly Arg Trp Ala Ala Arg Thr Phe 

* 2355 - - 2360 K * J 2365 . - * 

AGC AAC AAG AOG CTG GTG CTG GAT GAG AGC AOC ACA TOC AOG GGC ACT . . 7150 
Ser Asn Lys Thr Leu Val Leu Asp Glu Thr Thr Thr Ser Thr Gly Ser 
2370 1 ./\ 2375* r . : * 2380 ' 

GCA GGC ATG OGA CTG GTG CTG OGG OGG GGC GTG CTG OGG GAC GGC GAG 7198 
Ala Gly Met Arg Leu Val Leu Arg Arg Gly Val Leu Arg Asp Gly Glu 
2385 2390 " ' 2395 
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GGA TAC AOC TTC AOG CTC AOG GTG CTG GGC CGC TCT GGC GAG GAG GAG 7246 
Gly Tyr Thr Phe Thr Leu Thr Val Leu Gly Arg Ser Gly Glu Glu Glu 
2400 2405 2410 2415 

GGC TOC GOC TOC ATC OC3C CTG TOC OOC AAC OGC COG COG CTG GOG GGC 7294 
,Gly Cys Ala Ser lie Arg Leu Ser Pro Asn Arg Pro Pro Leu Gly Gly 
2420 2425 2430 

TCT TOC OGC CTC TTC CCA CTG GGC* OCT GTG CAC GOC CTC AOC AOC AAG 7342 
Ser Cys Arg Leu Phe Pro Leu Gly Ala Val His Ala Leu Thr Thr Lys 
2435 ' 2440 V. 2445 

GTG CAC TTC GAA TOC AOG GGC TOG CAT GAC GOG GAG GAT GCT'GQC GOC 7390 
Val His Phe Glu Cys Thr Gly Trp His Asp Ala Glu Asp Ala Gly Ala 
2450 2455 2460 . 

00G CTG GIG TAC GOC CTG CTG CTG OGG OGC TCT OGC CAG GGC CAC TOC 7438 
Pro Leu Val Tyr Ala Leu Leu Leu Arg Arg Cys Arg Gin Gly His Cys 
2465 2470 / 2475 

GAG GAG TTC TCT GTC TAC AAG GGC AOC CTC TOC AOC TAC GGA GOC GIG 7486 
Glu Glu Phe Cys Val Tyr Lys Gly Ser Leu Ser Ser Tyr Gly Ala Val 
2480 2485 . 2490 * 2495 

CTG OOC COG OCT TTC AGG OCA CAC TIC GAG GIG GGC CTG GOC GTG CTG ,7534 
Leu Pro Pro Gly Phe Arg Pro His Phe Glu Val Gly Leu Ala Val Val 
2500 2505 . , . 2510 

GIG CAG GAC CAG CTG GGA GOC OCT GIG GTC GOC CTC AAC AGG TCT TTC 7582 
Val Gin Asp Gin Leu Gly Ala Ala Val Val Ala Leu Asn Arg Ser Leu 

2515 •* - 2520 . . .\ 2525 • .* 

GOC ATC AOC CTC OCA GAG OOC AAC GGC AGC GCA AOG GGG CTC ACA GTC 7630 
Ala He Ttir Leu Pro Glu Pro Asn Gly Ser Ala Thr Gly Leu Thr Val 
2530 2535 : - 2540 

TOG CTG CAC GGG CTC ACC GCT AGT GTC CTC OCA GGG CTG CTG OGG CAG 7678 
Trp Leu His Gly Leu Thr Ala Ser Val Leu Pro Gly Leu Leu Arg Gin 
2545 - 2550 ■> 2555 

GOC GAT OOC CAG CAC GTC ATC GAG TAC TOG TTC GOC CTG GTC AOC GTC 7726 
Ala Asp Pro Gin His Val He Glu Tyr Ser Leu Ala Leu Val Thr Val 
2560 2565 ' 2570 2575 

CTG AAC GAG TAC GAG CGG GOC CTG GAC GTG GOG GCA GAG OOC AAG CAC 7774 
Leu Asn Glu Tyr Glu Arg Ala Leu Asp Val Ala Ala Glu Pro Lys His 
2580 . - * ■ . 2585 2590 

GAG OGG CAG CAC OGA GOC CAG ATA OGC AAG AAC ATC AOG GAG ACT CTG 7822 
Glu Arg Gin His Arg Ala Gin He Arg Lys Asn He Thr Glu Thr Leu 
* 2595 2600 - 2605 

GTC TOC CTG AGG GTC CAC ACT GIG GAT GAC ATC CAG CAG ATC GCT GCT 7870 
Val Ser Leu Arg Val His Thr Val Asp Asp He Gin Gin He Ala Ala 
2610 2615 2620 

GOG CTG GOC CAG TOC ATC GGG OOC AGC AGG GAG CTC GTA TOC OGC TOG 7918 
Ala Leu Ala Gin Cys Met Gly Pro Ser Arg Glu Leu Val Cys Arg Ser 
2625 " "2630 2635 
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TOC CIG AAG CAG AOG CIG CAC AAG CPS GAG GOC ATC ATC CTC ATC CIG 7966 
Cys Leu Lys Gin Thr Leu His Lys Leu Glu Ala Met' Met Leu lie Leu 
7 2640 * . 2645 ri 2650 ■ . . 2655 . 

CAG OCA GAG AGC AOC GGG GOC AOC ; CTG AOG OOC AOC GOC ATC GGA GAC 8014 
Gin Ala Glu Thr Thr Ala Gly Thr Val Thr Pro Thr Ala lie Gly Asp 
2660 • ~ 2665; - ; 2670 

AGC ATC CTC AAC ATC ACA GGA GAC CTC: ATC CAC CIG GOC AGC TOG GAC 8062 
Ser lie Leu Asn He Thr Gly Asp Leu He His Leu Ala Ser Ser Asp 
- r 2675 - - " 2680 j ' • 2685; r ' ' j- 

GIG COG GCA CCA CAG OOC TCA GAG CIG GGA. GOC GAG TCA OCA TCT OGG 8110 
Val Arg Ala Pro Gin Pro Ser Glu Leu Gly Ala Glu Ser Pro Ser Arg 
2690 * 2695 ; * 2700 

ATG GIG GOG TOC CAG GOC TAG AAC CIG AOC TCT GOC CTC ATG OGC ATC / 8158 
Met Val Ala Ser Gin Ala Tyr Asn Leu Thr Ser Ala Leu Met Arg He 

2705 - 2710 / - ' - : 2715 w • 

CTC ATG OGC TOC OGC GIG CTC AAC GAG GAG OOC CTG. AOG CIG GOG GOC . 8206 
Leu Met Arg Ser Arg Val Leu Asn Glu Glu Pro Leu Thr Leu Ala Gly 
2720 2725 ' 2730; . , 2735 

GAG GAG ATC GTC GOC CAG GGC AAG OGC TOG GAC COG COG AGC CTG CTC 8254 ~ 

Glu Glu He Val Ala Gin Gly Lys Arg Ser Asp Pro Arg Ser Leu Leu 
2740. . / .2745' V ' 2750 

TCC TAT GOC GGC GOC OCA GGG OCT GGC TOC CAC TIC TOC ATC, OOC GAG 8302 
Cys Tyr Gly Gly Ala Pro Gly Pro Gly Cys His Phe Ser He Pro Glu 
2755 , 2760;,. ... 2765 *: 

OCT TIC AGC GGG GOC CIG GOC AAC CTC ACT GAC GTG GTG CAG CTC ATC 8350 
Ala Phe Ser Gly Ala Leu Ala Asn Leu Ser Asp Val Val Gin Leu lie 
2770 ? - 2775. 2780 . 

TTT CTC GIG GAC TOC AAT OOC TIT OOC TIT GGC TAT' ATC AGC AAC TAC j 8398 
Phe Leu Val Asp Ser Asn Pro Phe Pro Phe Gly Tyr He Ser Asn Tyr 
2785 ^ 2790 v 2795 - ;, 

AOC GTC TOC AOC AAG GTG GOC TOG ATC GCA TIC CAG ACA CAG* GOC GGC 8446 
Thr Val Ser Thr Lys Val Ala Ser Met Ala Phe Gin Thr Gin Ala Gly 
2800 -- 2805 . v ,2810 . 2815 n 

GOC CAG ATC COC ATC GAG OGG CTC GOC TCA GAG OGC GOC ATC AOC GIG % 8494 
Ala Gin He Pro He Glu Arg Leu Ala Ser Glu Arg Ala He Thr Val 

282Q . 2825 ; - 2830 - , - . - 

AAG GTC OOC AAC AAC TOG GAC TOG GOT GOC, OGG GGC CAC OGC AGC TOC ** ' 8542 
Lys Val Pro Asn Asn Ser Asp Trp Ala Ala Arg Gly His Arg Ser Ser 
2835 - • 2840 2845 

GOC AAC TOC GOC AAC* TOC, GIT GTG GTC CAG OOC CAG GOC TOC GTC OCT 8590 
Ala Asn Ser Ala Asn Ser Val Val Val Gin Pro Gin Ala Ser Val Gly 
2850 2855 \. 2860 

GCT GTC GTC AOC CIG GAC AGC. AGC AAC OCT GOG .GOC GGG CTG CAt'cIG ^ 8638 
Ala Val Val Thr Leu Asp Ser Ser Asn Pro Ala Ala Gly Leu His Leu 
2865 2870 2875 
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CAG CTC AAC TAT AOG CTG CTG GAC'GGC CAC TAC CTG TCT GAG GAA OCT 8686 
Gin Leu Asn Tyr Thr Leu Leu Asp Gly His Tyr Leu Ser Glu Glu Pro . 
2880 2885 ' ~ . 2890 , 2895 

. GAG OCT TAC CIG OCA GTC TAC CTA CAC TOG GAG OOC COG COC AAT GAG 8734 
Glu Pro Tyr Leu Ala Val Tyr Leu His Ser Glu Pro Arg Pro Asn Glu 
2900 2905 .. 2910 

CAC AAC TGC TOG OCT AGC AGG AGG ATC OGC OCA GAG TCA CTC CAG GGT 8782 
His Asn Cys Ser Ala Ser Arg Arg life Arg Pro Glu Ser Leu Gin Gly 
2915 2920 2925 

'A 

OCT GAC CAC CGG COC TAC AOC TIC TIC ATT TOC 'COG GGG AGC AGA GAG 8830 
Ala Asp His Arg Pro Tyr Thr Phe Phe lie Ser Pro Gly Ser Arg Asp - 
2930 2935 29.40 

OCA GOG GGG ACT TAC CAT CTG AAC CTC T0C AGC CAC TIC OGC TOG TOG 8878 
Pro Ala Gly Ser Tyr His Leu Asn Leu Ser Ser His Phe Arg Trp Ser 
2945 2950 -r 2955 

GOG CTG CAG GTG T0C GIG GGC CTG TAC AOG TOC CTG TGC CAG TAC TIC ' 8926 
Ala Leu Gin Val Ser Val Gly Leu Tyr Thr Ser Leu Cys Gin Tyr Phe . 
2960 2965 2970, 2975 

AGC GAG GAG GAC ATG GIG TOG OGG ACA GAG GGG CTG CTG OOC CTG GAG 8974 
Ser Glu Glu Asp Met Val Trp Arg Thr Glu Gly Leu Leu Pro Leu Glu 
2980 ■« " , 2985 [ 2990 . 

GAG AOC TOG OCC OGC CAG GOC GTC TGC CTC AOC OGC CAC CTC AOC GOC 9022 
Glu Thr Ser Pro Arg Gin Ala Val Cys Leu Thr Arg His Leu Thr Ala 
2995 <3000/ : , 3005 

TIC GGC GOC AGC CTC TIC GIG OOC OCA AGC CAT GTC OGC TTT GTG TTT 9070 
Phe Gly Ala Ser Leu Phe Val Pro Pro Ser His Val Arg Phe Val Phe • . - t 
3010 3015- .- 3020 

OCT GAG COG ACA GOG GAT CTA AAC TAC ATC GTC ATG CTG ACA TCT GCT 9118 
Pro Glu Pro Thr Ala Asp Val Asn Tyr Ile« Val Met Leu Thr Cys Ala 
3025 . 3030 3035 



GIG TGC CTG GIG AOC TAC ATG GTC ATG GOC GOC ATC CTG CAC AAG CTG 9166 
Val Cys Leu Val Thr Tyr Met Val Met Ala Ala lie Leu His Lys Leu 
3040 3045 3050 ' ' 3055 

GAC CAG TTG GAT GOC AGC OGG GGC OGC GOC ATC OCT TIC TGT GGG CAG 9214 
Asp Gin Leu Asp' Ala Ser Arg Gly Arg Ala lie Pro Phe Cys Gly Gin 
; 3060 . 3065 ' ' - s . 3070 

OGG GGC OGC TTC AAG TAC GAG ATC CTC GTC AAG ACA GGC TOG GGC OGG 9262 
Arg Gly Arg Phe Lys Tyr Glu He Leu Val Lys Thr Gly Trp Gly Arg 
3075 3080 3085 < 

GGC TCA GGT AOC ACG GOC CAC GTG GGC ATC ATG CTG TAT GGG GIG GAC 9310 
Gly Ser Giy Thr Thr Ala His Val Gly lie Met Leu- Tyr Gly Val Asp 
3090 3095 3100 : 

AGC OGG AGC GGC CAC CGG CAC CTG GAC GGC GAC AGA GOC TTC CAC OGC 9358 
Ser Arg Ser Gly His Arg His Leu* Asp' Gly Asp Arg Ala Phe His Arg 
3105 3110 1 3115 • 
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AAC AGC CTG GAC -ATC TIC OGG ATC GOC AOC OOG GAC AGC CIG GGT AGC 9406 
Asn Ser "Leu Asp lie Phe Arg -lie Ala Thr Pro His Ser Leu Gly Ser 
3120 3125 3130 3135 

GIG TOG AAG ATC OGA GIG TOG CAC GAC AAC MA GGG CIG AGC OCT GOC - ,9454 
Val Trp Lys lie Arg Val Trp His Asp Asn Lys Gly Leu Ser Pro Ala 
3140 3145 3150 



TOG TIC CTG CAG CAC GIG ATC GIG AGG GAC CTG CAG AOG GCA OGC AGC 9502 
Trp Phe Leu Gin His Val He Val Arg Asp Leu Gin Thr Ala Arg Ser 
3155 3160 3165 

GOC TIG TIG CIG GIG AAT GAC TOG CTT TOG GIG GAG AOG GAG GOC AAC 9550 
Ala Phe Phe Leu Val Asn Asp Trp Leu Ser Val Glu Thr Glu Ala Asn 
3170_ 3175 3180 

GGG GGC CTG GIG GAG AAG GAG GIG CIG GOC GOG AGC GAC GCA GOC CTT 9598 
Gly Gly Leu Val Glu Lys Glu Val l£u Ala Ala Ser Asp Ala Ala Leu 
3185 3190^ t 3195 

TIG OGC TIG OGG GGC CIG CTG GIG GOT GAG CIG CAG GGT GGC TIG TIT ^9646 v 

Leu Arg Ffrie Arg Arg Leu Leu" Val Ala Glu Leu Gin Arg Gly Phe Phe 
3200 3205 3210 3215 

GAC AAG CAC ATC TOG CIG TOC ATA TOG GAG OGG OOG OCT GGT AGC OGT '9694 & 
Asp Lys His lie Trp Leu Ser Ile Trp Asp Arg Pro Pro Arg Ser Arg 
3220 3225 3230 

TIG ACT OGC ATC CAG AGG GOC AOC TOC TOG GIT CIG CIG ATC TOC CIG ~ -V 9742 
Phe Thr Arg lie Gin Arg Ala Thr Cys Cys Val Leu Leu He Cys Leu 
3235 3240 3245 

TIG CIG GGC GOC AAC GOC GIG TOG TAG GGG GCT GIT GGC GAC TCT GOC 9790 
Pte Leu Gly Ala Asn Ala Val Trp Tyr Gly Ala Val Gly Asp Ser Ala 
3250 3255 3260 

TAG AGC AOG GGG CAT GIG TOC AGG CTC AGC 1 OOG CIG AGC GIG GAC ACA * 9838 
Tyr Ser Thr Gly His Val Ser Arg Leu Ser Pro Leu Ser Val Asp Thr 
3265 3270 3275 

GIG GCT GIT GGC CIG GIG TOC AGC GIG GIT GIG TAT GOC GIG TAC CIG - 98B6 
Val Ala Val Gly Leu Val Ser Ser Val Val Val Tyr Pro Val Tyr Leu 
3280 3285 3290 3295 

GOC ATC CTT TIT CIG TIC OGG ATG TOC OGG AGC AAG GIG GCT GGG AGC 9934 
Ala He Leu Phe Leu Phe Arg Met Ser Arg Ser Lys Val Ala Gly Ser 
3300 3305 3310 

OOG AGC OOC ACA OCT GOC GGG CAG CAG GIG. CTG GAC ATG GAC, AGC TOC . - 9982 
Pro Ser Pro Thr Pro Ala Gly Glh Gin Val Leu Asp He Asp Ser Cys 
3315 3320 3325 

CTG GAC TOG TOC GIG CIG GAC AGC TOC TIC CIG AOG TTC. TCA GGC CIG - 10030 
Leu Asp Ser Ser Val Leu Asp Ser Ser Ihe Leu Thr Phe Ser' Gly Leu 
3330 3335 3340 

CAC GCT GAG GOC TIT' GIT GGA CAG. ATC AAG ACT GAG TIG TIT CIG GAT * 10078 
His Ala Glu Ala Pte Val Gly; Gin Met Lys Ser Asp Leu Phe Leu Asp . 
3345 3350 3355 
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GAT TCT AAG ACT CTG GIG TGC TOG GOC TOC GGC GAG GGA ACG CTC ACT 10126 
Asp Ser Lys Ser Leu Val Cys Trp Pro Ser Gly Glu Gly Thr Lau Ser 
3360 t .3365 - , 3370 ^ 3375 

TOG OOG GAC CTG CTC ACT GAC OOG TOC AIT GIG OCT AGC AAT CTG OGG 10174 
Trp Pro Asp Leu Leu Ser Asp Pro Ser lie Val Gly Ser Asn Leu Arg 
3380 ' . 3385 .3390 ' 

CAG CTG GCA CGG GGC CAG GGG GGC CAT GGG CTG GGC CCA GAG GAG GAC 10222 
Gin Leu Ala Arg Gly Gin Ala Gly His Gly Leu Gly Pro Glu Glu Asp 
3395 3400 , 3405 

GGC TIC TOC CTG GOC AGC C0C TAC TOG OCT GOC AAA TOC TTC TCA GCA 10270 
Gly Phe Ser Leu Ala Ser Pro Tyr Ser Pro Ala Lys Ser Hie Ser Ala 
3410 3415 * - ; ' ~ '* 3420 

TCA GAT GAA GAC CTG ATC CAG CAG CTC CTT V GOC GAG GGG CTC AGC AGC 10318 
Ser Asp Glu Asp Leu He Gin Gin Val Leu Ala Glu Gly Val Ser Ser 
3425 3430 ■ - 3435 

CCA GOC OCT AOC CAA GAC AOC CAC ATG GAA ACG GAC CTG CTC AGC AGC 10366 
Pro Ala Pro Thr Gin Asp Thr His Met Glu Thr Asp Leu Leu Ser Ser 
3*40 3445 < ■ - ' 3450 3455 - 

CTG TOC AGC ACT OCT GGG GAG AAG ACA GAG ACG CTG GOG CTG CAG AGG 10414 
Leu Ser Ser Thr Pro Gly Glu Lys Thr Glu Thr Leu Ala Leu Gin Arg 
3460 ; ; : 3465 3470 '; 

CTG GGG GAG CTG GGG CCA GOC AGC CCA GGC CTG AAC TOG GAA CAG COC 10462 
Leu Gly Glu Leu Gly Pro Pro Ser Pro Gly Leu Asn Trp Glu Gin Pro 
3475 . ; ; 3480 ' ^ 3485 ^ 

CAG GCA GOG AGG CTG TOC AGG ACA GGA CTG GIG GAG GOT CTG CGG AAG 10510 
Gin Ala Ala Arg Leu Ser Arg Thr Gly Leu Val Glu Gly Leu Arg Lys 
3490 „ ^ * '3495 ; * 3500 

CQC CTG CTG OOG GOC TGG TCT GOC TOC CTG GOC CAC GGG CTC AGC CTG 10558 
Arg Leu Leu Pro Ala Trp Cys Ala Ser Leu Ala His Gly Leu Ser Leu 
3505 ' ' 3510 r ' 3515 

CTC CTG GIG GCT CTG GCT GTG GCT GTC TCA GGG TGG GIG GGT GOG AGC 10606 
Leu Leu Val Ala Val Ala Val Ala Val Ser Gly Trp Val Gly Ala Ser 
3520 3525 3530 3535 

TIC OOC OOG GGC CTG ACT GIT GOG TGG CTC CTG TOC AGC AGC GOC AGC 10654 
Phe Pro Pro Gly Val Ser Val Ala Trp Leu Leu Ser Ser Ser Ala Ser 
v 3540 ' > 3545 ' ' -3550 

TTC CTG GOC TCA TTC CTC GGC TGG GAG OCA CTG AAG GTC TTG CTG GAA 10702 
Phe Leu Ala Ser Phe Leu Gly Trp Glu Pro Leu Lys Val Leu Leu Glu 
3555 " 3560 ' * . ; 3565 

GOC CTG TAC TTC TCA CTG GTG GOC AAG OGG CTG CAC OOG GAT GAA GAT 10750 
Ala Leu Tyr Re. Ser Leu Val Ala Lys Arg Leu His Pro Asp. Glu Asp 

357P " . 3575 \ 3580 ;.- 

GAC AGC CTG CTA GAG AGC COG GCT GTG ACG OCT GTG AGC GCA OCT GTG 10798 
Asp Thr Leu Val Glu Ser Pro Ala Val Thr Pro Val Ser Ala Arg Val 
- 3585 " ~ 3590 3595 • - < : • 
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OOC OGC GTA OGG OCA OOC CAC GGC TTT GCA CTC TIC CTG GOC AAG GAA 10846 
Pro Arg Val Arg Pro Pro His Gly Phe Ala Leu Phe Leu Ala Lys Glu 
3600 3605 . . 3610 . 3615 

.GAA GOC OGC AAG GTC AAG AdG CTA CAT GGC ATG CTG OGG £GC CTC CTG 10894 
Glu Ala Arg Lys Val Lys Arg Leu His Gly Met Leu Arg Ser Leu Leu 
3620 . .... 3625 . ... 3630 

CTG TAC ATG CTT TIT CTG CTG GTS AOC CTG CTG GOC AGC TAT GGG GAT * 10942 
Val Tyr Met Leu Fte Leu Leu Val Hit Leu Leu Ala Ser Tyr Gly Asp 
3635 3640 * 3645 



GOC TCA TQC CAT GGG CAC GOC TAC OCT CTG CAA AGC GOC ATC AAG CAG 10990 
Ala Ser Cys His Gly His Ala Tyr Arg Leu Gin Ser Ala He Lys Gin 
3650; - 3655 . ; „ 3660 

GAG CTG CAC AGC OQG GOC TIC CTG GOC ATC AOG OGG TOT GAG GAG CTC f < - 11038 
Glu Leu His Ser Arg Ala Phe Leu Ala He Thr Arg Ser Glu Glu Leu 
... 3665 * 3670 . - , 3675 

TOG OCA TQG ATG GOC CAC GIG CTG CTG OOC TAC GTC CAC GGG AAC CAG 11086 
Trp Pro Trp Met Ala His Val Leu Leu Pro Tyr Val His Gly Asn Gin 
3680 - 3685 ■ 3690 v 3695 

TOC AGC OCA GAG CTG GGG OOC OCA COG" CTG OQG CAG GIG OGG CTG CAG 11134 
Ser Ser Pro Glu Leti Gly Pro Pro Arg Leu Arg Gin Val Arg Leu Gin 
3700 : : - ,.. 3705 v t - 3710 

GAA GCA CTC TAC o£a- GAC OCT OOC GGC Go£ AGG GTC CAC AOG TOC TOG ! '11182 
Glu Ala Leu Tyr Pro Asp Pro Pro Gly Pro Arg Val His Thr Cys Ser 
. 3715 3720,; - 3725 

GOC GCA GGA GGC TTC AGC AOC AGC GAT TAC\GAC GTT GGC TOG GAG AGT 11230 
Ala Ala Gly Gly Phe Ser Thr Ser Asp Tyr Asp Val Gly Trp Glu Ser 
3730 3735 - 3740 

OCT CAC AAT GOC TOG GGG AGG TC3G GOC'tAT TCA GOG OOG GAT CTG CTG 11278 
Pro His Asn Gly Ser Gly Thr Trp Ala Tyr Ser Ala Pro Asp Leu Leu 
. 3745 . 3750 3755 

GGG GCA TGG TOC TOG GGC TOC TOT GOC GIG TAT GAC AGG GGG GGC TAC 11326 
Gly Ala Trp Ser Trp Gly Ser Cys Ala Val Tyr Asp Ser Gly Gly Tyr 
3760 3765 . 3770 - 3775 

GIG CAG GAG CTC . GGC CTG AGC CTG GAG GAG AGC OGC GAC OGG CTG OGC 11374 
Val Gin Glu Leu Gly Leu Ser Leu Glu Glu Ser Arg Asp Arg Leu Arg 

- - 3780 t <r -3785 s 3790 ^ - 

TIC CTG CAG CTG CAC AAC TGG CTG GAC AAC AGG AGC OGC OCT GIG TTC ' * 11422 
Phe Leu Gin Leu His Asn Trp Leu Asp Asn Arg Ser Arg Ala Val Phe 
3795 , ; - 3800 3805 

CTG GAG CTC AOG OGC TAC AGC OOG GCC GIG GGG CTG CAC GOC GOC GTC 11470 
Leu Glu Leu Thr Arg Tyr Ser Pro Ala Val Gly Leu His Ala Ala Val 
3810 - 3815 3820 

AOG CTG OGC CTC GAG TIC OOG GOG GOC GGC OGC GOC CTG GOC GOC CTC 11518 
Thr Leu Arg Leu Glu Phe Pro Ala Ala Gly Arg Ala Leu Ala Ala Leu 
3825 3830 3835 
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AGC GTC OGC OOC TTT GOG CTG OGC OGC CTG AGC GOG GGC CTC TOG CTG . 11566 
Ser Val Arg Pro Pte Ala Leu Arg Arg Leu Ser Ala Gly Leu Ser Leu ; 
3840 3845 3850 3855" 

OCT - CTG CTC AOC TOG GTG TGC CTG CTG CTG TIC GOC CTG CAC TIC GOC . . 11614 
Pro Leu Leu Thr Ser Val Cys Leu Leu Leu Phe Ala Val His Phe Ala 
3860 : 3865 3870 " 

GTG GOC GAG GOC OCT ACT TGG CAC AGG GAA GGG CGC TGG OGC GTG CTG 11662 
Val Ala Glu Ala Arg Thr Trp His Arg Glu Gly Arg Trp Arg Val leu 
3875 J u 3880 3885 

CGG CTC GGA GOC TGG GOG CGG TGG CTG CTG GTG GOG CTG AOG GOG GOC 11710 . 
Arg Leu Gly Ala Trp Ala Arg Trp Leu Leu Val Ala Leu Thr Ala Ala 
3890 3895 '* 3900 

ACE'C^CIGCTACGCCnCGOCCAGCTC , 11758,. 

Thr Ala Leu Val Arg Leu Ala Gin Leu Gly Ala Ala Asp Arg Gin Trp 
3905 3910 * * 3915 

ACC.OGT TIC CTG OGC GGC CGC COG CGC CGG TTG. ACT AGC TIC GAC CAG . 11806 . 
Thr Arg Phe Val Arg Gly Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin . 
3920 3925 3930 / 3935 

GTG GOG CAC GTG AGC TOC GCA GOC OCT GGC CTG GOG GOC TOG CTG CTC ... 11854 
Val Ala His Val Ser Ser Ala Ala Arg Gly Leu Ala Ala Ser Leu Leu / 
3940 * 3945 ' 3950 

TIC CTG CTT'TTG CTC .AAG GCT GOC CAG ,GAC CTA CGC TTC CTG CGC CAG 11902 . 
Pte Leu Leu Leu -Val Lys Ala Ma Gin His .Val Arg Phe Val Arg Gin . 
3955 : 3960 jf" 3965 

TGG TOC CTC TIT GGC AAG ACA.TTA TGC CGAGCT CTG OCA GAG CTC CTG. , 11950 
Trp Ser Val Fhe<Gly Lys Thr Leu Cys .tag Ala Leu Pro Glu Leu Leu . ' , ; 
3970 1 3975 3980 

GGG GIC AOC TTG GGC" CTG GTG GTG CTC GGG CTA GOC TAC GOC CAG CTG t 11998 
Gly Val Ihr Leu Gly Leu Val Val leu Gly Val Ala Tyr Ala Gin Leu 
3985 3990 3995 

GOC ATC CTG, CTC GTS TCT TOC TCT GIG GAC TOC CTC TGG AGC CTG GOC 12046 
Ala lie Leu Leu Val Ser Ser Cys Val Asp Ser Leu Trp Ser Val Ala 
4000 4005 r 4010 4015 

CAG GOC CTG" TTG GTG CTG TGC : CCT GGG ACT GGG -CTC -TCT AOC CTG TCT .12094 
Gin Ala Leu Leu Val Leu Cys Pro Gly Thr Gly leu Ser Thr Leu Cys 
4020 r-4025 4030 

CCT GOC GAG;TOC TGG CAC CTG-TCA CCC CTG CTG TCT GTG GGG CTC TGG. 12142 
Pro Ala Glu Ser Trp His Leu Ser Pro r Leu Leu Cys Val Gly leu Trp. \ 
4035 ; 4040* ""4045 

GCA CTG CGG' CTG TGG GGC GOC CTA OGG CTG GGG GCT GTT ATT- CTC CGC . 12190 
Ala Leu Arg Leu Trp Gly Ala- Leu Arg Leu Gly Ala Val lie Leu Arg; " 

4050 4055* "'4060 ' ; ' 

TOG CGC TAC CAC GOC TTG OCT GGA GAG .CTG TAC CGG COG GOC TGG GAG 12238 
Trp Arg Tyr His Ala Leu Arg Gly Glu Leu Tyr Arg Pro Ala Trp. Glu . 



4065 



4070 



4075 
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OOC GAG GAC TAG GAG ATG GIG GAG TIG TIC CTG OGC AGG CTG OGC CTC 12286 
Pro Gin Asp Tyr Glu Met Val Glu Leu Pte Leu Arg Arg Leu Arg Leu - 
4080 4085 « 4090 4095 

TGG ATG GQC CTC AGC AAG GTC AAG GAG TTC GGC CAE AAA CTC OGC ITT ::■ ' 12334; 
Trp Met Gly Leu Ser Lys Val Lys Glu Phe Arg His Lys Val Arg Phe < 
4100 4105 4110 

GAA GGG ATG GAG 00G CTG CCC TCT OGC T0C T0C AGG GGC TOC AAG GTA . ' 12382 
Glu Gly Met Glu Pro leu Pro Ser Arg Ser Ser Arg Gly Ser Lys Val 
4115 4120 4125 ; 

TOC 00G GAT GTC CCC CCA CCC AGC GOT GGC TOC GAT GOC TOG CAC OOC r, 12430. v 
Ser Pro Asp Val Pro Pro Pro Ser Ala Gly Ser Asp Ala Ser His Pro ; 
4130 ~ 4135 4140 

TOC AOC TOC TOC AGC CAG CTG GAT GGG CTG AGC GTG AGC CTG GGC OGG - 12478 ' 
Ser Thr Ser Ser Ser Gin Leu Asp Gly Leu Ser Val Ser Leu Gly Arg . 
4145 4150 - : 4155 

CTG GGG ACA AGG TGT GAG OCT GAG 00C TOC OGC CTC GAA GOC GTG TTC > 12526: 
Leu Gly Thr. Arg Cys Glu Pro Glu Pro Ser Arg Leu Glh Ala Val Phe . .1 . \ 
4160 4165 4170 4175 

GAG GOC CTG ,CTC AOC CAG TTT GAC OGA CTC AAC CAG GOC ACA GAG GAC 12574 ; 
Glu Ala Leu Leu Thr Gin Pte As$> Arg Leu Asn Ala Thr Glu Asp * 
4180 ■ 4i85 4190 

GTC TAG CAG CTG GAG CAG CAG CIG'GAC : CTO CAA. G&^ ;12622 
Val Tyr Gin Leu Glu Gih Gin Lieu His Sfer Leu Gin Gly Arg Arg Ser ' 
4195 4200 4205 

AGC OGG GOG OOC GOC GGA iCT TCC CGT GGCCCA'TOC COG -GGC 'CTG OGG , 12670 
Ser Arg Ala Pro Ala Gly iSer Sier Arg Gly Pro Ser ' Prb Gly Leu Arg - - 
4210 4215 : * 4220 

OCA OCA CTG OOC AGC OGC CIT GOC .OGG -GOC ACT* GGG GGT GIG GAC CTG 12718 
Pro Ala Leu Pro Ser Arg Leu Ala -Arg Ala Ser Arg Gly Val Asp Leu _ 
4225 4230* 4235 

GOC ACT GGC COC AGC AGG ACA OCT TOG GGC CAA GAA CAA GGT OCA OOC * 12766. 
Ala Thr Gly Pro Ser Arg Thr Pro Ser Gly Gin Glu Gin Gly Pro -Pro 
4240 4245 4250 4255 

CAG CAG CAC TTA GTC CTC CTT CCT GGC GGG GGT GGG OGG TGG AGT OGG ' 12814. 
Gin Gin His Leu Val Leu Leu -Pro* Gly Gly -Gly* Gly ^ Pro Trp : Ser Arg 
4260 - 4265 4270 

ACT GGA CAC OGC TCA CTA TTA CTT TCT GOC r GCT CTC AAG GOC GAGTGGC < ' 12862' 
Ser Gly His Arg Ser ' Val Leu Leu Ser Ala Ala Val Lys Ala Glu Gly ? 
4275 * 4280 4285 - 

CAG GCA GAA TGG CTG CAC CTA GGT TOC OCA GAG AGC- AGG CAG GGG GAT - 12910 
Gin Ala Glu Trp Leu His Val Gly Ser Pro Glu Ser Arg Gin Gly His 
4290 ■ 4295 4300 

CTG TCT GTC TGT GGG CTT CAG CAC TIT AAA GAG GCT GTG TGG' OCA *ACC 12958 
leu Ser Val Cys Gly Leu Gin His Pte ' Lys Glu Ala' Val Trp Pro Thr ~ - 
4305 4310 4315 
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AQG AOC CAG GGT COC CIC <XC?AGC'TQC CTT'GGG AAG GAC ACA .GCA GTA- - 13006 
Arg Thr Gin Giy Pro Leu Pro Ser Ser Leu Gly Lys Asp Thr Ala Val 
4320 . 4325 4330 4335 

1TC GAC GGT TTC TAGOCTCTGA GATQCTAATT TATTTOCCCG AGTOCTCAGG 13058* 
Leu Asp Gly r Phe . 

TACAGOGGQC TCTGOOOG0C aXAOXIXT GGGCAGATCT OCTXX^CTGC TAAGGCTGCT , 13118 

GGCTICAGGG AGGGTTAGCXrTGCAGOGCCG OCAOOCTOCC CCTAAGTTAT TAOCTCT0CA 13178 

GITOCTACCG TAdTOOCTGC ACI33TCnCAC- TOIGTCTCTC GTCTCAGTAA TTTATATGCT 13238 

GTTAAAATGT GTATATTnT GTATGTCACT ATITICACrA GGGCTGAGGG GCCTGOGOOC 13298 

AGAGCTGGOC T0C00CAACA CCTGCTG0GC TTGGTAGCTG TGCTGG0GTT ATGGCAG0CC ' 13358 

GGCTGCTGCT TGGATOOGAG LTlUUULTlG GGCngKCT QGGGQCACAG OCTCTGOCA - 13418_ 

GGCACTCTCA TCAO00CAGA GGCCTICTCA TCCTOOCITG -GCOCAGQOCA GGTAGCAAGA 13478 

GAGCAGOQOC CAGGOCTOCT GGCATCAGGT CTGGGCAAGT AGCAGGACTA GGCATGTCAG 13538 

AGGAOOCCAG GGTGGTTAGA ' GGAAAAGACT Ocfel OT G G CCTGCCTCCC AGGGIGGAbG 13598 * 

AAGCTGACTG TCICTCTGTC TGICTQ090G CX3GX3ACX30GC GACTGTGCTG TATOGOCCAG 1365.8 

GCAGOCTCAA GGOOCTOGGA GCTOGCICTG (JC'IGLTIUXJ TGTADCACIT CICTGGGCAT 13718 

OCaULIJLTAi:r AGAGCCTOGA CacmXXTA AdDOOObCAC CAAGCAGACA AAGldAATAA 13778 

AWSAGCTGTC TGACTGCAAA AAAAAAAAAV ..v .. . ; ; 13807 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SBQv.ID NO: 2: r 

Gly Ala Ala Cys Arg Val Asn Cys Ser Gly Arg Gly Leu Arg Thr Leu 

1 ; 5 ' • : . ,10 ' . 15 

Gly Pro Ala Leu Arg lie Pro Ala Asp Ala Thr Ala Leu Asp Val Ser 
20 ; 25 30 ■ 

His Asn Leu Leu Arg Ala Leu Asp Val Gly Leu Leu Ala Asn Leu Ser 
35*'' ' - j -40 ~- ■ ;=45 . ^ . 

Ala Leu Ala .Glu Leu Asp He Ser Asn Asn Lys He Ser Thr Leu Glu 

50 ■ .55 60 - ' . ' 

Glu Gly lie Phe Ala Asn Leu Phe Asn Leu Ser Glu lie Asn Leu Ser 

65 - 70 * 1 : 75 * s - < 80 

Gly Asn Pro Phe Glu Cys .Asp Cys Gly Leu Ala Trp Leu Pro Arg Trp 

85 , - ' <-90 ' - - * : 95/ ' / 

Ala Glu Glu Gin Gin Val Arg Val Val .Gin Pro Glu Ala. Ala Thr Cys 
KXT " *\ ■ 105* ' 110 - 



Ala Gly Pro Gly Ser Leu Ala Gly Gin Pro Leu Leu Gly He Pro Leu 
. 115 ' "v. 120 125 
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Leu Asp Ser Gly.Cys Gly:Glu Glu Tyr -Val Ala Cys Leu Pro-Asp Asn 

130 ." .; 135 • ;. ? 140 

Ser Ser Gly Thr Val Ala Ala Val Ser Phe Ser Ala Ala His Glu Gly 

145 150 - — 

Leu Leu Gin Pro Glu Ala Cys Ser Ala Phe Cys Phe Ser Thr Gly Gin' 
165 170 1"5 

Gly Leu Ala Ala'leu Ser Glu Gin Gly Trp' Cys leu Cys 'Gly Ala Ala 
•* •. 180 ~ ;185 , • •• .-. 

Gin Pro Ser - Ser Ala Ser Phe_Ala Cys- Leu- Ser Leu Cys Ser Gly Pro 
195 200 * u - ) 

Pro Pro Pro Pro Ala Pro Tnr C^s ArgGly Pro Thr Leu Leu Gin His 
,.210 . x ... 215,. - ■ 220. . , . 

Val- Phe Pro Ala Ser Pro Gly Ala Thr . Leu , Val Gly Pro His Gly Pro 
225- - ' 230 - - 235 - <™ 

Leu Ala Ser Gly Gin Leu Ala Ala Phe His Ue Ala Ala Pro Leu Pro - 
245 . . . ; . .250 , . 255 . . 

Val Thr Ala. Thr. Arg Trp. Asp Phe Gly Asp Gly Ser Ala Glu Val Asp 
• 260 265- ■ • ' * Z7U w 

Ala ; Ala Gly P^ Ala "Ala^Ser His -'Arg -Tyr- Val Leu Pro- Gly 'Arg Tyr 
!- •■ 275 r • . . - • 2JBC ! c--. ' • •■<. •• 

His Val Thr Ala Val Leu Ala Lsu Gly Ala Gly Ser Ala Leu Leu Gly 

290 •••••• ' 295'' ....'300.' > ... 

Thr- Asp Val Gin Val Glu Ala Ala Pro Ala Ala Leu Glu Leu Val„Cys 
305 310 315 320 

Pro Ser Ser Val Gin Ser Asp Glu Ser Leu Asp Leu Ser lie Gin Asn 
325 330 335 

Arq Gly Gly Ser Gly Leu Glu Ala Ala 'Tyr Ser lie Val Ala Leu Gly 
340 345 350 

Glu Glu Pro Ala Arg Ala Val His Pro 1^ Cys Pro Ser Asp Thr. Glu 
355 360 365 

Ue Phe Pro Gly Asn Gly His Cys Tyr Arg^Leu Val Val Glu Lys'Ala • 
370 375 380 

Ala Trp Leu Gin Ala Gin Glu Gin Cys Gin' Ala Trp Ala Gly .Ala Ala 
385 390 395 

Leu Ala Met Val Asp Ser Pro Ala Val Gin Arg Phe Leu': Val Ser Arg 
405 410 415 

Val Thr Arg Ser Leu" Asp Val Trp lie Gly Phe Ser Thr Val Gin Gly" 
420 425" 430 

Val Glu Val Gly Pro Ala Pro Gin Gly Glu Ala Phe Ser Leu Glu Ser 
435 440 445 " 

Cvs Gin Asn* Trp Leu Pro" Gly Glu Pro His" Pro Ala Thr Ala Glu His 
Y 450 455 ^460 
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Cys Val Arg Leu Gly Pro Thr Gly Trp Cys Asn Thr Asp Leu Cys Ser 

465 470; \ ' 475 ♦ 480 

Ala Pro His Ser Tyr Val Cys Glu Leu Gin Pro Gly Gly Pro Val Gin 

485 * ; * 490 . . 495 . * 

Asp Ala Glu Asn Leu Leu Val Gly Ala Pro Ser Gly Asp Leu Gin Gly 

. 500 .505 - . • 510 



Pro Leu Thr Pro Leu Ala Gin Gin Asp Gly Leu Ser Ala Pro His Glu 
515 520 525 

Pro ValGlu Val ffet Val Phe?PrO'Gly Leu Arg Leu Ser Arg Glu Ala 
530 535 540 : 

Phe Leu Thr Thr Ala Glu Phe Gly Thr Gin Glu Leu Arg Arg Pro' Ala- 
545 550 555 560 

Gin Leu Arg Leu Gin Val Tyr Arg Leu Leu Ser Thr Ala Gly Thr Pro 
565 .570 575 

Glu Asn Gly Ser Glu Pro Glu Ser Arg Ser Pro Aso Asn Arg Thr Gin 
580 585 590 

Leu Ala Pro Ala Cys Met Pro Gly Gly Arg Trp Cys Pro Gly Ala Asn 
595 600 605 

He Cys Leu Pro Leu Asp Ala Ser Cys . His Pro Gin Ala Cys Ala,. Asn. 
610 ' 615 620 

Gly Cys Thr Ser Gly Pro -Gly Leu Pro Gly Ala Pro .Tyr Ala Leu Trp 
625 630" v 635 640 

Arg Glu Phe Leu Phe Ser Val Ala Ala Gly Pro -Pro Ala Gin Tyr Ser 
645 650 655 

Val Thr Leu His Gly Gin Asp Val Leu Met Leu Pro- Gly Asp Leu Val 
660 . ^ " '665 .. ;\ 670 

Gly Leu Gin His Asp Ala Glv Pro Gly Ala Leu Leu. His Cys Ser Pro. 
675 " 680 ■ 685 

Ala Pro Gly His Pro .Gly Pro Gin. Ala ;Pro. Tyr Leu Ser Ala Asn Ala 
690 695 ,. . 700 

Ser Ser Trp Leu Pro. His Leu Pro Ala Gin Leu Glu Gly Thr Trp Ala 
705 710 " . 715 720 

Cys Pro Ala /Cys Ma Leu Arg Leu -Leu Ala Ala Thr Glu Gin Leu Thr ~ 
. 725 730 735 

Val Leu Leu Gly Leu Arg Pro Asn Pro Gly Leu Arg Met Pro Gly Arg 
740 745 750 

Tyr Glu Val Arg Ala Glu Val Gly .Asn Gly Val Ser Arg His Asn Leu 
755 \ 760 765 

Ser Cys Ser Phe Asp Val Val Ser , Pro Val Ala .Gly Leu Arg Val He.. 
770 775 „ 780 

Tyr Pro -Ala Pro -Arg Asp Gly Arg Leu Tyr Val .Pro Thr Asn Gly Ser 

785 . ; 790 - ... 795 ... 800 
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Ala Leu Val Leu Gin Val Asp Ser Gly Ala Asn Ala Thr Ala Thr Ala 
805 810 815 

Arg Trp Pro- Gly Gly Ser Val Ser ' Ala Arg Phe Glu Asn Val Cys Pro 
820 825 830 

Ala Leu Val Ala Thr Pte Val Pro Gly' Cys Pro Trp Glu Thr Asn Asp 
835 840 845 

Thr Leu Pbe Ser Val -Val Ala Leu Pro Trp 'Leu'Ser GluGly-Glu His ' 
850 : 855 .860 

Val Val Asp Val Val - Val Glu Asn - Ser Ala Ser Arg Ala Asn~Leu>Ser * 
865 870 * - 875 • . 880 

Leu Arg Val Thr Ala Glu Glu Pro lie Cys Gly Leu Arg Ala Thr Pro* .. . . 

885 890 895 i. 

Ser Pro Glu Ala Arg Val Leu Gin Gly Val: Leu "Val Arg Tyr Ser Pro 
i900 -905 910 

Val Val Glu" Ala Gly' Ser Asp Met Val Phe Arg Trp Thr -lie Asn Asp s iu 
915 920 ■ . 925 ^ 

Lys Gin Ser Leu Thr Pte Gin Ash Val Val Phe Asn -Val lie Tyr Gin 
930 935 - 940 

Ser Ala Alar.VaT Fhe^Lys Leu Ser Tau Thr Ala Ser Asn His Val Ser . . -* v 
945 950 955 1 960 1 , 

Asn Val Thr Val Asn Tyr Asn Val \Tfcr Val GiirArg'Met Asn Arg Met \y 
965 ^ 970 ' 975 > 

Gin Gly Leu^Gln Val Ser Thr Val Pro Ala Val r Leu Ser Pro Asn Ala - 
980 985 990 

Thr Leu ' Val Leu Thr Gly Gly Val Leu Val Asp -Ser Ala Val Glu Val - 
995 1000 1005 

Ala Phe Leu Trp Asn Phe Gly Asp Gly Glu Gin Ala Leu His vGln Phe > f . ? ■" 
1010 - 1015 ' 1020 

Gin Pro Pro Tyr Asn Glu Ser Phe Pro Val Pro Asp Pro Ser Val Ala 
1025 1030 . 1035 : 4 1040 

Gin Val Leu Val Glu His Asn -Val Met His Thr Tyr Ala Ala Pro Gly 
1045 1050 I - 1055 

Glu Tyr Leu Leu Thr -Val Leu Ala Ser Asn Ala Phe Glu Asn Leu 'Thr - 
.1060 -1065 1070 

Gin Gin Val Pro Val: Ser Val Arg Ala Ser Leu Pro Ser Val Ala Val * 
1075 .J' 1080 V* 1085 

Gly Val . Ser Asp Gly Val Leu Val Ala Gly Arg Pro Val Thr : Phe Tyr 
1090 ? 1095 1100 

Pro His Pro Leu Pro Ser Pro -Gly Gly Val "Leu Tyr Thr Trp Asp Phe 
1105 1110 1115 • 1120 

Gly Asp Gly- Ser Pro Val' Leu Thr Gin Ser Gin' Pro Ala Ala Asn -His* I 
1125 : 1130 1135 
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Thr Tyr Ala Ser Arg Gly Thr Tyr His Val Arg Leu Glu Val Asn Asn 
• 1140 ■ 1145 1150 • - 

Thr Val Ser Gly Ala, Ala Ala Gin Ala Asp Val Arg Val Phe Glu Glu 
1155 ■ 1160 ■ - ; . ■ 1165 ' 

Leu Arg Gly Leu .Ser Val Asp Met Ser Leu Ala Val Glu Gin Gly Ala 
1170 ' 1175 ^ 1180 ' 

Pro Val Val Val Ser Ala Ala Val Gin Thr Gly Asp Asn He Thr Trp 
1185 1190 : 1195 ^ 1200 

Thr Phe Asp Met Gly Asp Gly Thr Val Leu Ser Gly Pro Glu Ala Thr 
. 1205 - " . ' *;1210 1215 

Val Glu His Val Tyr Leu Arg Ala Gin Asn Cys Thr Val Thr Val Gly 
. 1220 1225 , 1230 . 

Ala Ala Ser Pro Ala Gly His Leu Ala Arg Ser Leu His Val Leu Val 
1235 1240. . 1245 

Phe Val Leu Glu Val Leu Arg Val Glu Pro Ala Ala Cys He Pro Thr 
1250 1255 1260 

Gin Pro Asp Ala Arg Leu Thr Ala Tyr Val Thr Gly Asn Pro Ala His 
1265 1270 1275 m ^ .1280 

Tyr Leu Phe Asp Trp Thr Phe Gly Asp Gly Ser Ser Asn Thr Thr Val 
1285 1290 1295 

Arg Gly Cys Pro Thr Val Thr His Asn Phe Thr Arg Ser Gly Thr Phe 
1300 . 1305, 1310 , r 

Pro Leu Ala Leu Val Leu Ser Ser Arg Vai Asn Arg Ala His Tyr Phe 
1315 1320 _ 1325 

Thr Ser He Cys Val Glu Pro Glu Val Gly Asn Val Thr Leu Gin Pro 
1330 ^ 1335 1340 

Glu Arg Gin Phe Val Gin Leu Gly Asp Glu Ala Trp Leu Val Ala Cys 
1345 1350 1355 1360 

Ala Trp Pro Pro Phe Pro Tyr Arg Tyr Thr Trp Asp Phe Gly Thr Glu " 
1365 1370 ^ 1375 

Glu Ala Ala Pro Thr Arg Ala Arg Gly -Pro Glu Val Thr Phe He Tyr 
. 1380 1385 1390 

Arg Asp Pro Gly Ser Tyr Leu Val Thr Val Thr Ala Ser Asn Asn He 
1395 1400 m 1405 

Ser Ala Ala Asn Asp Ser Ala Leu Val Glu Val Gin Glu Pro Val Leu 
1410 1415 1420 

Val Thr Ser He Lys Val Asn Gly Ser Leu Gly Leu- Glu Leu Gin Gin 
1425 1430 ^ 1435 1440 

Pro Tyr Leu Phe Ser Ala Val Gly -Arg Gly Arg Pro Ala Ser Tyr Leu 
1445 1450 1455 
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Trp Asp Leu Gly Asp Gly Gly Trp Leu Glu Gly Pro Glu Val Thr His 
1460 . 1465 " 1470 

Ala Tyr Asn Ser Thr Gly- Asp Pte Thr Val Arg Val Ala Gly Trp Ash' 
1475 1480 1485 

Glu Val Ser Arg. Ser Glu Ala -Trp Leu 'Asn Val Thr Val Lys Arg Arg 
1490 1495 1500 

Val Arg Gly Leu Val Val Asn Ala . Ser Arg Thr Val Val Pro' Leu Asn 
1505 1510 1515 1520 

Gly Ser Val Ser Fte Ser "Thr Ser "Leu Glu Ala Gly Ser Asp Vai Arg 
1525 1530 1535 



Tyr Ser Trp Val Leu Cys Asp Arg Cys Thr" Pro He Pro Gly Gly Pro 
... 1540 ■ . A 545 . : . ^SO 

Thr lie Ser Tyr Thr Phe Arg Ser Val Gly Thr Phe Asn lie lie Val 
1555 -■. i 1560 .. .. - 1565 

Thr Ala Glu Asn Glu Val Gly Ser Ala Gin Asp Ser lie Phe Val Tyr 
1570 . _ ■ - 1575 , . v ,1580 . , - 

Val Leu Gin Leu lie Glu Gly Leu Gin Val Val Gly Gly Gly Arg Tyr 
1585 , : - ,1590 . ? , v ; . 1595 ...... - .1600 

Phe Pro Thr Asn His Thr Val Gin Leu Gin Ala Val Val Arg Asp Gly 
.1605 * ; 3., ., : 1610. <r . v .j .1615 ; 

Thr Asn Val Ser Tyr Ser Trp Thr Ala Trp Arg Asp Arg Gly Pro Ala 
1620 ... :,.162S • ,- . , 1630 

Leu Ala Gly Ser Gly Lys Gly Phe Ser Leu Thr Val Leu Glu Ala Gly 
1635 . ,1640 v . /Vi : - 1645 ; . , 

Thr Tyr His Val Gin Lai Arg Ala Thr Asn Met Leu Gly Ser Ala Trp 
1650 , -1655 . ,1660 : _. 

Ala Asp Cys Thr Met Asp Phe Val' Glu Pro Val Gly Trp Leu Met Val 
1665 r , 1670 . \ 1675 A v - 1680 

Thr Ala Ser Pro Asn Pro Ala Ala Val Asn Thr Ser Val Thr "Leu Ser 
1685 1690 . • • 1695 

Ala Glu Leu Ala Gly Gly Ser Gly Val Val Tyr Ttir Trp Ser Leu' Glu 

1700 1705 1710 

• t~ - •• . ■ 

Glu Gly Leu Ser Trp Glu Thr Ser Glu Pro Phe Thr Thr His Ser Phe 
1715 1720 - 1725 y , ^ 

Pro Thr Pro Gly Leu His Leu Val Thr Met Thr Ala Gly Asn Pro Leu 
1730 . ..■ rl^S , • ■ 1740 _ : 

Gly Ser Ala Asn Ala Thr Val Glu Val Asp Val Gin Val Pro Val Ser 
1745 ..1750 .. v „1755 1760 

Gly Leu Ser lie Arg Ala Ser Glu Pro*Gly Gly Ser Pte Vai Ala Ala 
1765 1770 1775 
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Gly Ser Ser Val Pro Phe Trp Gly Gin Leu Ala Thr Gly Thr Asn Val .. 
1780 . 1785.;. - 1790. 

Ser Trp Cys Trp Ala Val Pro Gly Gly -Ser Ser Lys Arg.Gly Pro His 
1795 " 1800 t ; 1805 

Val Thr Met Val Phe Pro Asp Ala. Gly Thr Phe Ser lie Arg Leu Asn \ 
1810 ' 1815 1820 

Ala Ser Asn Ala. Val -Ser Trp Val Ser Ala. Thr Tyr Asn Leu Thr Ala 
1825 " 1830 1835 1840 

Glu Glu Pro lie -Val Gly Leu Val Leu Trp 'Ala Ser Ser- Lys Val Val 
1845 ' 1850 ' 1855 

Ala Pro Gly Gin Leu Val His Phe Gin He- Leu Leu Ala Ala Gly Ser 
1860 • 1865 . 1870 

Ala Val Thr Phe Arg Leu Gin Val Gly Gly Ala Asn Pro Glu Val *Leu 
1875 1880 1885 

Pro Gly Pro Arg Phe Ser His Ser Phe Pro Arg Val Gly Asp His Val - 7 
1890 " 1895 1900 ;y 

Val Ser Val Arg Gly Lys Asn His Val Ser Trp Ala Gin Ala Gin Val < 
1905 ; 1910 1915 . 1920 

Arg lie Val Val Leu Glu Ala Val SerdyLeu Gin Met Pro Asn Cys 
1925 1930 V 1935 

Cys Glu Pro Gly lie Ala Thr - Gly Thr Glu.Arg :Asn Phe. Thr Ala 'Arg.. 
1940 1945 A 1950. 

Val Gin Arg Gly Ser Arg Val Ala Tyr; Ala Trp Tyr Phe Ser Leu Gin 
1955 1960 . 1965 

Lys Val Gin Gly Asp Ser Leu Val He Leu Ser Gly Arg Asp Val Thr. 
1970 "1975 1980 

Tyr Thr Pro Val Ala Ala Gly Leu Leu Glu lie. Gin Val Arg Ala Phe 
1985 1990 , 1995 2000 

Asn Ala Leu Gly Ser Glu Asn Arg Thr- Leu Val Leu Glu Val Gin Asp 
.2005 2010 2015 

Ala Val Gin Tyr Val Ala Leu 'Gin Ser> Gly Pro Cys Phe Thr Asn Arg:- : 
' 2020 _ ^ 2025 - 2030 . ' 

Ser Ala Gin Phe Glu Ala Ala Thr Ser - Pro Ser Pro Arg Arg Val Ala ; 
2035 _ 2040 2045 

Tyr His Trp Asp Phe Gly Asp Gly Ser Pro Gly "Gin Asp Thr Asp Glu 
2050 2055 2060 

Pro Arg Ala Glu His Ser Tyr Leu Arg Pro Gly- Asp- Tyr Arg Val Gin.-. 
2065 ' 2070 2075 _ 2080 

Val Asn Ala Ser Asn Leu Val Ser Pte Phe Val Ala Gin Ala Thr Val : 
2085 ' 2090 2095 
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Thr Val Gin Val Leu Ala Cys" Arg Glu Pro Glu Val Asp Val Val Leu 
2100 2105 : ^ v 2110 

Pro Leu Gin Val Leu Met Arg Arg Ser : Gin Arg Asn' Tyr Leu Glu 1 Ala 
2115 - 2120 ' "i* 2125 

His Val Asp Leu Arg Asp- Cys Val Thr : Tyr Gin ~ Thr ' Glu • Tyr : Arg Trp "* f 
2130 2135 2140 

Glu Val Tyr Arg Thr Ala-Ser Cys Gin Arg Pro Gly- Arg Pro Ala Arg 
2145 2150 - 2155 2160 

Val Ala Leu''Pro Gly Vail 'Asp Val 'Ser -tog Pro Arg -Leu" Val Leu Pro - 
" 2165 - ■ 2170 - 2175 

Arg Leu Ala Leu Pro Val 'Gly His Tyr- Cys Phe Val Hie Val Val Ser 
2180 2185 - 2190 * 

Phe Gly Asp Thr Pro Leu Thr Gin Set lie : Glri Ala' Asn Val Thr Val - 
2195 - 2200 2205 

Ala Pro Glu* Arg Leu- Val Pro lie lie Glu Gly Gly Ser TVr Arg Val : - - 
2210 2215 2220 

Trp Ser Asp Thr Arg Asp' Leu Val ' Leu Asp Gly Ser : Glu Ser Tyr Asp 
2225 . 4 2230 2235 2240 

Pro Asn Leu Glu Asp. Gly Asp Glh Tlir^ Pro Leu Ser Pbe His Trp Ala 
2245 -.-2250 ' 2255 

Cys Val Ala Ser Thr Gin Arg Glu' Ala Gly Gly 'Cys Ala Leu Asn Phe" - " T 
2260 2265 c r 2270 

Gly Pro Arg Gly Ser Ser Thr Val Thr lie Pro Arg Glu Arg Leu Ala 
2275 2280 - 2285 

Ala Gly Val Glu* Tyr Thr Phe Ser Leu Thr Vai Trp' Lys Ala Gly \Arn " -■* • 
2290 2295^ 2300 - 

Lys Glu Glu Ala Thr Asn Gin Thr- Val Leu-" lie Arg Ser Gly Arg Val 
2305 2310 - 2315 2320 

Pro lie Val .Ser Leu Glu Cys Val Ser Cys Lys * Ala Gin Ala* Val : Tyr - ■ 
2325 2330 2335 

Glu Val Ser Arg Ser Ser Tyr Val^Tyr Leu Glu Gly Arg Cys Leu .Asn - : 
2340 2345 ' - 2350 

Cys Ser Ser' Gly. Ser Lys* Arg Gly Arg Trp Ala Ala Arg" Thr Phe Ser ' 5 
2355 2360 <" 2365 

Asn Lys Thr Leu Val Leu Asp -Glu Thr Thr Thr Ser Thr Gly Ser Ala 
2370 2375 2380 

Gly Met Arg Leu Val Leu Arg Arg Gly'Val'Leu Arg Asp Gly Glu Gly 
2385 2390 ' * . 2395 2400 

Tyr Thr Phe Thr -Leu 'Thr Val- Leu Gly- Arg Ser Gly : Glu Glu Glu Gly 
- J 2405 . 2410 2415 
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Cys Ala Ser lie Arg Leu Ser Pro Asn Arg Pro Pro Leu Gly Gly Ser 
-2420 " -'-2425 : 2430 

Cys Arg Leu Phe Pro Leu Gly Ala Val His Ala Leu Thr Thr Lys Val 
2435 : 2440 2445 " 

His Phe Glu Cys Thr Gly Trp His Asp Ala Glu Asp Ala Gly Ala Pro 
2450 ' 2455 ,„ 2460 ' 

Leu Val Tyr Ala Leu Leu Leu Arg, Arg Cys Arg Gin Gly His Cys Glu 

2465 ^ 2470 \ ' 2475 . - 2480 

*• ■ . * - 

Glu Hie .Cys Val Tyr Lys Gly Ser Leu Ser Ser Tyr Gly Ala Val Leu 
2485 2490 2495 

Pro Pro Gly Pte Arg Pro His Phe Glu Val Gly Leu Ala Val Val Val 
2500 2505 2510 

Gin Asp Gin Leu Gly Ala Ala Val Val Ala Leu Asn Arg Ser Leu Ala 
2515 '* 2520 2525 

He Thr Leu Pro Glu Pro Asn Gly Ser Ala Thr Gly Leu Thr Val Trp 
2530 - 2535 1 2540 

Leu His Gly Leu Thr Ala Ser Val Leu Pro Gly Leu Leu Arg Gin Ala 
2545 2550 . - : - * 2555 ~ 2560 



Asp Pro Gin His Val lie Glu Tyr Ser Leu Ala Leu Val Thr Val Leu* 
2565 - 2570 ~ 2575 

Asn Glu Tyr Glu Arg' Ala Leu' Asp "Val Ala Ala Glu Pro Lys His Glu- 
2580 2585- 2590 

Arg Gin, His Arg Ala Gin He Arg Lys Asn lie' Thr Glu Thr Leu Vai 
2595 2600 2605 

Ser Leu Arg Val His Thr Val Asp Asp He Gin Gin- lie Ala Ala Ala 
2610 2615 2620 

Leu Ala Gin Cys Met Gly Pro Ser Arg Glu Leu Val Cys Arg Ser Cys 
2625 2630 2635 2640 

Leu Lys Gin Thr Leu His> Lys Leu Glu Ala Met Met Leu lie Leu Gin 
2645 2650 2655 

Ala Glu Thr Thr Ala Gly Thr Val Thr Pro Thr Ala He Gly Asp Ser 
2660 2665 2670 

He Leu Asn He Thr Gly Asp Leu lie -His Leu Ala Ser Ser Asp Val 
2675 - - 2680 • 2685 

Arg Ala Pro Gin Pro Ser Glu hkx Gly Ala Glu Ser Pro Ser Arg Met" 
2690 2695 2700 

Val Ala Ser Gin- Ala Tyr Asn Leu Thr "Ser Ala Leu Met Arg He' Leu ' 
2705 2710 2715 2720 

Met Arg Ser Arg Val Leu Asn Glu Glu Pro Leu Thr Leu. Ala Gly Glu 
2725 2730 2735 



SUBSTITUTE SHEET (RULE 26) 



WO 95/34649 PCT/GB95/01386 



39/77 



Glu lie Val Ala Gin Gly Lys Arg Ser Asp Pro Arg Ser Leu Leu Cys 
2740 2745' 2750 

Tyr Gly Gly Ala Pro Gly Pro Giy Cys His Pte Ser lie Pro Glu Ala 
2755 2760 2765 

Phe Ser Gly Ala Leu Ala Asn Leu Set Asp Val Val Glh Leu * lie Phe 
2770 2775 2780 

Leu Val Asp iSer Asn' Pro Pte Pro' Pte Gly T^ lie " S^ Asn Tyr 'Thr 
2785 ' 2790 '* 2795 2800 ~ " 

Val Ser Thr Lys Val Ala "Ser Met Ala' Phe Gin ThrGln*Ala Gly Ala *" 
2805 2810 2815 

Gin lie Pro lie Glu Arg Leu Ala Ser Glu Arg Ala lie Uir.Val Lys"* 
2820 2825 2830 

Val Pro Asn Asn Ser Asp Trp Ala Ala Arg Gly His Arg Ser Ser Ala 
2835 2840 2845 

Asn Ser Ala Asn Ser Val* Val Val Gin Pro Gln/Ala'Ser Val Gly Ala " 1 
2850 2855 2860 

Val Val Thr Leu Asp Ser Ser. Ash Pro Ala Ala Gly Leu His Leu Gin 
2865 2870 2875 2880 

Leu Asn Tyr Thr Leu Leu Asp Gly r Hia Tyr Leu Ser Glu Glu Pro Glu- . , *h 
v - "2885 * 2890 v V " 2895 

Pro Tyr Leu : Ala. Val Tyr Leu His Ser .Glu Pro Arg Pro Asi Glu His ■ 
2900 " 2905* ' 2910 * 

Asn Cys Ser Ala Ser Arg Arg lie Arg Pro, Glu- Ser Leu Gin Gly Ala 
2915 2920 \. • 2925 ' 

Asp His Arg Pro Tyr Thr Phe Phe lie Ser Pro Gly Ser Arg Asp Pro 
2930 J 2935 x2940 

Ala Gly Ser Tyr His, Leu Asn. Leu Ser Ser His Phe Arg- Trp. Ser Ala 
2945 2950. ' *2955* ^ ' 2960 ' 

Leu Gin Val Ser Val Gly Leu Tyr Thr Ser Leu Cys Gin Tyr Phe Ser. 

' 2965 ~ . 2970 " 2975 

Glu Glu Asp Met Val Trp Arg Thr Glu Gly Leu Leu- Pro Leu Glu Glu - 
2980 2985 " 2990^ 

Thr Ser Pro Arg Gin Ala Val Cys Leu Thr Arg His Lai Thr Ala Phe - 
2995 3000' . ' 3005 

Gly Ala Ser Leu Phe Val Pro Pro Ser His Val Arg. Phe Val Phe. Pro 
3010 3015 * " 3020 ' * 

Glu Pro Thr Ala Asp r Val Asn Tyr lie- Val Met Leu Thr Cys Ala Val 
3025 / 3030 3035.: ~' 3040 

Cys Leu Val .Thr .Tyr Met Val Met Ala Ala lie Leu His Lys Leu Asp 
* " 3045 ~ " - 3050 "3055' *~ 
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Gin Leu Asp Ala Ser Arg Gly Arg Ala lie Pro Phe Cys Gly Gin Arg * 
3060 • 3065"* 3070 

Gly Arg Phe Lys Tyr Glu lie Leu Val Lys : Thr Gly Trp Gly Arg Gly . 
3075 3080 ; - 3085 

Ser Gly Thr Thr Ala His Val Gly lie Met Leu. Tyr Gly Val Asp Ser 
3090 3095 3100 ' . 

Arg Ser Gly His Arg His Leu Asp Gly Asp Arg. Ala Phe His Arg Asn 
3105 3110 ; : , " 3115 3120 

Ser Leu. Asp lie Phe Arg He Ala Thr. Pro -His Ser Leu Gly Ser Val 
3125 . "3130 _ 3135 ' 

Trp Lys lie Arg Val Trp His Asp Asn Lys Gly Leu Ser Pro Ala Trp 
3140 3145 : 3150 ' 

Phe Leu Gin His Val He Val Arg Asp Leu Gin Thr Ala Arg Ser Ala 
3155 3160 "3165 

Phe Phe Leu. Val Asn Asp Trp Leu Ser Val Glu Thr. Glu Ala Asn Gly - 
3170 3i75 ^ ' 3180 

Gly Leu Val Glu Lys Glu Val Leu Ala Ala Ser Asp Ala Ala Leu . Leu 
3185 3190 ' 3195 3200 

Arg Phe Arg Arg Leu Leu Val Ala Glu Leu Gin Arg Gly Phe Phe Asp 
_ 3205 ;ry t "3210 ; 3215 

Lys His lie. Trp Leu Ser lie Trp Asp Arg Pro Pro Arg Ser, Arg. Phe , 
3220 3225.., 3230- 

Thr Arg He Gin Arg Ala Thr Cys Cys Val Leu Leu lie -Cys Leu Phe 
3235 3240 3245 

Leu Gly Ala Asn Ala- Val Trp Tyr } Gly Ala Val Gly Asp Ser Ala Tyr , . 
3250 . 3255 ' ' 3260 

Ser Thr Gly His Val Ser Arg Leu Ser Pro Leu Ser Val Asp Thr Val . 
3265 ' . 3270 • : . ; 3275 : 3280 

Ala Val Gly Leu Val Ser Ser Val Val Val Tyr Pro Val Tyr Leu Ala 
3285" - 3290; : 3295. 

He Leu Phe Leu Phe Arg Met Ser Arg Ser Lys Val Ala 'Gly Ser Pro 
3300 > 3305 3310 

Ser Pro Thr Pro Ala Gly Gin Gin Val Leu Asp He Asp Ser Cys Leu 
, 3315 . 3320 ; ; . <; 3325^ 

Asp Ser Ser Val Leu Asp Ser Ser Phe Leu Thr Phe Ser Gly Leu "His 
3330 ■ ; 3335- 3340. 

Ala Glu Ala Phe Val Gly Gin Met Lys Ser Asp Leu Phe Leu Asp "Asp 
3345 . 3350 . k 3355 3360 

Ser Lys Ser Leu Val Cys Trp Pro Ser Gly Glu Gly 'Thr Leu Ser Trp 
3365. * 3370 3375 
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Pro Asp Leu Leu Ser Asp Pro Ser lie Val^Giy Ser Asn Leu- Arg Gin 
3380 3385 - 3390 

Leu Ala Arg Gly Gin Ala Gly riis : Gly- Leu Giy Pro Glu Glu Asp Gly 
3395 3400 3405 

Phe Ser 'Leu Ala Ser Pro Tyr-Ser 'Pro-Ala Lys Ser Phe Ser Ala Ser ■ ' 
3410 3415 3420 

Asp Glu Asp Leu lie Gin- Gin Val Leu Ala Glu Gly Val Ser Ser Pro'-= 
3425 3430 3435 3440 : 

Ala Pro Thr Gin Asp Thr His Met Glu *Thr Asp Leu -Leu Ser Ser Leu 
3445 ' • - 3450 - 3455 

Ser Ser Thr' Pro Gly Glu Lys TTur Glu Thr Leu Ala leu Gin Arg Leu ;~ 
3460 3465 3470 

Gly Glu Leu Gly Pio Pro Ser Pro 'Giy- Leu . Asn : Tip Glu* Gin 'Pro Gin 
3475 3480 ■ 3485 

Ala Ala Arg Leu Ser Afg^Thr Giy -Leu' Val Glu Gly Leu Arg Lys Arg 
3490 3495 3500 

Leu Leu Pro Ala Trp Cys '-Ala : Ser J -Ifiu r -"Ala-His Gly Leu Ser Leu Leu 
3505 ' • 3510 3515 3520 

Leu Val- -Ala' Val 'Ala Val Ala Val Ser -Gly Trp Val Gly Ala- SerfPhe . r 
3525 ' ' 3530 ' °- 3535 

Pro Pro Giy Val Ser Val Ala Trp' IeuTLeu Ser Ser Ser Ala Ser Hie -. 
3540 3545''' 3550 

Leu Ala Ser Phe Leu Gly Trp Giu Pro" Leu Lys Val Leu. Leu' Glu Ala 
3555 3560 * 3565 

Leu Tyr Phe Ser Leu Val Ala Lys Arg . Leu- His Pro Asp Glu Asp Asp - 
3570 3575 "3580 

Thr Leu Val Glu Ser Pro Ala Val Thr Pro Val Ser Ala Arg Val Pro 
3585 3590 3595 3600 

Arg Val Arg Pro Pro His Gly Phe Ala Leu Phe Leu Ala Lys Glu Glu 
3605 , 3610 3615 

Ala Arg Lys Val Lys Arg Leu His Gly' Met Leu Arg Ser Leu Leu Val 
3620 ^ ^ 3625 3630 

Tyr Met Leu Pte Leu Leu Val Thr Leu Leu Ala Ser Tyr Gly Asp Ala 
3635 _ . 3640 . 3645 

Ser Cys His Gly His Ala Tyr Arg Leu Gin Ser Ala lie Lys Gin Glu 
3650 3655 3660 

Leu His Ser Arg Ala Phe Leu Ala' He Thr Arg Ser Glu Glu Leu Trp 
3665 . 3670 _ 3675 3680 

Pro Trp Met Ala His Val Leu Leu Pro Tyr Val His Gly Asn Gin Ser 
3685 3690 3695 
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Ser Pro Glu Leu Gly Pro Pro Arg Leu "Arg Gin Vai Arg leu Gin Glu 
3700 3705 3710 

Ala Leu Tyr Pro Asp Pro Pro Gly Pro Arg Val His Thr Cys Ser Ala 
3715 3720 3725 

Ala Gly Gly Phe Ser Thr Ser Asp Tyr Asp Val Gly Trp Glu Ser Pro rr 
3730 3735 3740 

His Asn Gly Ser Gly Thr Trp Ala Tyr Ser Ala Pro -Asp Leu Leu Gly 
3745 3750 3755 3760 

Ala Trp Ser Trp Gly Ser Cys Ala Val "Tyr Asp Ser Gly . Gly Tyr Val * J 
3765 3770 ' 3775 

Gin Glu Leu Gly Leu Ser Leu Glu Glu Ser Arg Asp Arg Leu; Arg Phe - 
3780 3785 3790 

Leu Gin Leu His Asn Trp Leu Asp Ash Arg Ser Arg Ala Val Phe Leu 
3795 3800 3805 

Glu Leu Thr Arg Tyr Ser Pro Ala Val Gly Leu His Ala Ala Val Thr 
3810 3815 3820 

Leu Arg Leu Glu Phe Pro Ala Ala Gly Arg Ala Leu Ala Ala Leu Ser 
3825 3830 3835 3840 

Val Arg Pro Phe Ala Lai Arg Arg Leu Ser Ala Gly Leu Ser' Leu Pro 
' 3845 - 3850 3855 

Leu Leu Thr Ser Val Cys Leu *I^"Lai : ^'-iaa'Val'His Phe.rAla Val * 
3860 3865" * 3870 

Ala Glu Ala Arg Thr 'Trp His Arg Glu Gly. Arg* Trp Arg' Val Leu Arg 
3875 3880 3885 

Leu Gly 'Ala Trp Ala Arg Trp Leu Leu Val Ala Leu Thr Ala Ala Thr 
3890 3895 3900 

Ala Leu' Val Arg Leu Ala Gin Leu Gly Ala 'Ala Asp. Arg Gin Trp' Thr 
3905 " ' \ 3910 3915 3920 

Arg Phe Val Arg dy Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val 
3925 3930 3935 

Ala His Val Ser Ser Ala Ala ^ A±g -Gly Leu Ala Ala Ser Leu" Leu Phe 
3940' 3945 3950 

Leu Leu Leu Val Lys-Ala* Ala Gin' His Val Arg Phe Val Arg Gin Trp 1 
3955 ■ 3960 3965 

Ser Val Phe Gly Lys Thr Leu Cys Arg Ala Leu Pro Glu Leu Leu Gly - 
3970 3975 3980 

Val Thr Leu Gly Leu Val Val Leu Gly Val Ala Tyr Ala Gin Leu Ala 
3985 3990 " 3995 4000 

lie Leu Leu Val Ser Ser Cys Val Asp Ser Leu Trp* Ser Val" Ala Gin 
* 4005 4010 ** 4015 
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Ala Leu Leu Val Leu Cys Pro Gly Thr Gly Leu Ser Thr U=»u Cys Pro 
4020 4025,;* 4030 ■ 

Ala Glu Ser Trp His Leu Ser Pro Leu JLeu Cys Val Gly Leu Trp Ala 
4035 4b40 * 4045. ; , 

Leu Arg Leu Trp Gly Ala Leu Arg Leu Gly Ala. Val He Leu Arg Trp . 
_ .4050 , '4055' -4060 

Am Tvr His Ala Leu Arn .Gly Glu Leu Tyr .Arg Pro Ala Trp Glu Pro 
4065 - : 4070 ' l * * 4075 T , - - 4080 ^ 

Gin Asp Tyr Glu Met Val. Glu Leu Phe Leu .Arg Arg Leu Arg Leu Trp 

4085 • 4090 . 4095 - 

Met Gly Leu Ser Lys. Val Lys Glu Phe Arg His Lys Val Arg Phe Glu 
4100 " ' 4105 \. 4110 

Gly Met Glu Pro leu Pro Ser Arg Ser Ser ^ Gly Ser Lys V^ 
4115 ' , 4120' ; 4125 

Pro Asp Val Pro Pro Pro Ser Ala. Gly Ser Asp Ala Ser His Pro Ser 
4130 " 4135 " 1 * ' " - 4140 

Thr Ser Ser Ser Gin Leu Asp .Gly .Leu Ser Val Ser. Leu Gly Arg leu 
4145 4150 *" '4155 . 4160 - 

Gly Thr Am Cys. Glu Pro Glu Pro Ser Arg,Leu Gin . Ala Val Phe Glu 
'4165" * % '4170 ^ \.: 4175 

Ala Leu Leu Thr Gin Phe Asp Arg Leu Asn Gin Ala Thr Glu Asp Val 
^ 4180 '4185;':: '* 4190 , " 

Tyr Gin Leu Glu Gin Gin. Leu His Ser Leu. Gin .Gly Arg Arg Ser Ser 
* 4195 " 4200 " , ' " ' 4205 * r 

Arg Ala Pro Ala Gly Ser Ser Arg Gly Pro Ser Pro Gly Leu Arg Pro 
4210 4215 ^ ; 4220 : " 

Ala Leu Pro Ser Ajg Leu Ala Arg Ala Ser Arg Gly .Val Asp leu Ala 
4225 * 4230 * 4235 4240 

Thr Gly Pro Ser Arg Thr Pro Ser Gly Gin. Glu, Gin Gly Pro Pro Gin 
4245 / -4250 ' " . 4255 ■ 

Gin His Leu Val Leu Leu Pro Gly Gly Gly Gly Pro Trp Ser Arg Ser . 

4260 7 * ' 4265; J * " ' ' 4270 r* * ' 

Gly His Arg Ser Val Leu Leu Ser. Ala Ala Val. Lys Ala Glu Gly Gin 
4275- ' " fc ' 4280 " 4285 

Ala Glu Trp Leu His Val Gly Ser Pro Glu Ser Arg Gin Gly His leu 
4290 4295'* * , 4300 

Ser Val Cys Gly Leu Gin His Pte Lys Glu Ala Val. Trp Pro. Thr Arg 
4305 . 4310- " : 4315 t: * * '4320 

Thr Gin Gly Pro Leu Pro Ser Ser Leu Gly Lys Asp Thr Ala Val leu 
* 4325 y 4330 " "~ ^ 4335 * 

Asp Gly Phe 
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Figure 10 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 3: (Caipare Figure 7) 

CTC AAC GAG GAG' COO CTG ACG CTG GOG QGC* GAG GAG ATC GTG GOG CAG^ ^ -* "48 
Leu Asn Glu Glu Pro Leu Thr Leu -Ala Gly Glii Glu He Val Ala Gin- 
4340 4345 4350 ~' J 4355 

GGC AAG OGC TOG GAC COG. OGG AGO CTG CIG^TGC TAT /GGC GGC. GOC OCA - 96 
Gly Lys Arg Ser Asp Pro Aug Ser Leu^ Leu Cys Tyr Gly Gly Ala Pro 
4360 4365 • 4370 

GGG OCT GGC TGC CAC TIC TOC ATC C0C GAG GCT TIC AGC GGG* GOC CTG 144 
Gly Pro Gly Cys His Phe Ser He Pro Glu Ala Phe Ser Gly Ala Leu 
4375 . 43BD 4385 

GOC AAC CTC ACT GAC GIG GTG" CAG CTC ATC TTT CTG GTG GAC TOC AAT 192 
Ala Asn Leu Ser Asp Val Val Gin Leu He Phe Leu Val Asp Ser Asn 
4390 - 4395 . 4400 

OOC TTT OOC TTT GGC TAT ATC AGC AAC TAC- AOC GTC TOC AOC AAG- GTG 240 
Pro Phe Pro Phe Gly Tyr lie Ser Asn Tyr Thr Val Ser Thr Lys Val 
4405 r4410 * v ; , 4415 

GOC TOG ATG' GCA TTC CAG ACA GAG GCC GGC GOC CAG ATC OOC ATC GAG - "288 
Ala Ser Nfet Ala Phe Gin Thr Gin Ala Gly Ala Gin lie Pro He Glu 
4420 - 4425. ' 'V. 44 ? 0 ' ^ 5 : 

OGG CTG GOC TCA GAG OGC GOC ATC AOC CTG AAG GIG OOC AAC. AAC TOG 336 
Arg Leu Ala Ser Glu Arg Ala He Thr Val Lys Val Pro Asn Asn Ser 
' --4440 4445 -4450-, - 

GAC TGG GCT GOC OGG GGC CAC OGC AGC TOO GCC AAC TOC GCC . AAC TOC 1 " 384 
Asp Trp Ala Ala Arg Gly His Arg Ser Ser Ala Asn Ser Ala Asn Ser 
*4455 y " 4460'" 4465 • 

■ ■ ' - ** *. ■ ." \ ■ * 

GTT CTG GTC CAG OOC CAG GOC TOC GTC GGT GCT GTG GTC AOC CTG GAC ' 432 
Val Val Val Gin Pro Gin Ala Ser Val Gly Ala Val Val Thr Leu Asp 
4470 : [} : * 4*80' 

AGC AGC AAC OCT GOG GOC GGG CTG CAT CTG CAG CTC AAC TAT AOG CTG , 480 
Ser Ser Asn Pro Ala Ala Gly . Leu His Leu Gin Leu Asn Tyr Thr Leu 
4485 . , 4490 1; 4495 

CTG GAC GGC CAC TAC CTG TCT GAG GAA OCT GAG 000 TAC CTG GCA GTC 528 
Leu Asp Gly- His Tyr Leu Ser Glu Glu Pro Glu Pro Tyr Leu Ala Val 
4500 ..4505" * . \ ; 4510 ; : '\ -/ 4515 - 

TAC CTA CAC TOG GAG OOC OGG OOC AAT GAG CAC AAC TGC TOG GCT AGC 576 
Tyr Leu His Ser Glu Pro Arg- Pro Asn Glu His Asn Cys Ser, Ala Ser 
• 4520 . v : _ ; 4525' 4530 

AGG AGG ATC OGC OCA GAG TCA CTC CAG GGT GCT GAC CAC OGG OCC TAC 624 
Arg Arg He Arg Pro Glu Ser Leu Gin Gly Ala Asp His Arg Pro Tyr 
4535 • 4540 .4545 . 7 

AOC TTC TTC ATT TOC COG GGG AGC AGA GAC OCA GOG GGG ACT TAC CAT 672 
Thr Phe Phe lie Ser Pro Gly Ser Arg Asp Pro Ala. Gly .Ser-Tyr His 
4550 *• 4555". 4560 

CTG AAC CTC TOC AGC CAC TTC OGC TGG TOG GOG CTG CAG GTG TOC GTG 720 
Leu Asn Leu Ser Ser His Phe Arg Trp Ser Ala Leu Gin. Val. Ser Val. r 
4565~ * 4570'' 4575 -.:.f 
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768 



816 



864 



rrr era TAG AGG-TOC CTG TGC CAG TAC.. TTC, AGC GAG GAG GAC ATG GTC 
gg S Tyr-Tnr S 2S Cys.-Gln-.Tyr Phe'ser^u Glu- Asp.-Met Val 
4580 ; . 4585- ~ 1. 4590 - 

TTG CGG AGA GAG GGG CTG CTG GOC CIG GAG GAG ACC TOG OOC CGC CAG 
S S S Su £y Leu Leu Prb Lai -Glu 'Glu Tnr. Ser Pro. Arg Gin. 

4600 ' : :J -4605 • " «3iu 

rrr-^TGCCTCACCCGCCACCTCACCGCCTTCGXGC^ 
S S CyS S S S His L» Thr. Ala-Phe Gly.Ala |r lau Phe. , 

4615 4620, \ ' .. s ~ 

CTG CXT. CCA AGC CAT GTC CGC TTT GTG TTT OCT GAG CCG ACA GOG GAT 912 
4630 ...4635- , 464 0, ( . • 

GTA AAC TAC ATC GTC ATG CTG ACA TGT GCT GTG TGC CTG GTC ACC TAG 
S Jsn Tyr ,25 Val Met Leu Thr- Cys Ala Val. Cys Lbu Val Thr Tyr 
4645. 4650 / 1 r . 74655. 

ATC GTC ATG GOC GCC ATC CTG GAG AAG CTG GAC CAG TTG GAT GOC AGC 
Met Val M=t Ala Ala lie -Leu His Lys Leu Asp Gin. Leu Asp Ala Ser 
4660 .:, -4665 'l ' , : 4670 . ■ 5 

CGG GGC CGC GOC ATC OCT TTC TGT GGG CAG CGG GGC CGC TTC AAG TAC 
•Arg Gly Arg Ala lie Pro R» Cys Gly^Gln Arg. Gly Arg Phe JysTyr . 

4680 4oo3 . ' * 

GAG ATC CTC GTC AAG ACA GGC TGgWcGG GGC : TCA GCT ACC ACS GCC 1104 
Glu lie Leu Val Lys Thr Gly Trp Gly Arg Gly Ser Gly Thr Tnr Ala 

4695 •"*'• % "■" '4700 ". ! ~ '4705. . . ~cj- ? „- 

CAC GTG GGC ATCATC CTG TAT GGG GTG GAcIaGC CGG AGC GGC CAC ' ' ' 1152 
His Val Gly lie Met Leu Tyr Gly Val Asp Ser Arg Ser Gly His Arg 

4710 ' — • -4715 . = ; X. 4720 

CAC CTG GAC GGC GAC' AGA GCC TTC CAC CGC AAC.AGC'CTG GAC ATC TTC ' 1200 
HisLeuAspGlyAspArgAlaPheHisArgAsn Ser Leu Asp lie Phe 
4725- ' 4730 ,-v- \ - ... » 4735 



960 



1008 



1056 



CGG ATC GCC ACC COG CAC AGC CTG GCT XGC GTC ,TGG AAG ATC CGA GTG - 1248 

1296 



CGG ATC GCC ACC COG CAC AS- CIG w»r ^j^Z , VT« *~ «=T 
Arg lie Ala Thr Pro His Ser Leu Gly Ser Val Trp Lys lie Arg Val 
4745 .... .4750 . 4/m 



4740 

TGG CAC GAC AAC AAA GGG CTC AGC CCT GCC TGC TTC CTG- CAG CAC GTC J 



Trp His Asp Asn Lys Gly Lsu Ser Pro Ala Trp Phe lieu Gin His Val 

. 4760 ... 4765 . . . v _. 47 / u .. .. 

ATC GTC AGG GAC CTG CAG ACG OCA CGC AGG GCC TTC TTC GTG GTC; AAT 1344 
lie Val Arg 'Asp Leu Gin Thr Ala Arg Ser Ala Phe Phe Leu Val Asn 
.4775 4780 ... 4785 

GAC TQG'CTT TOG GTC GAG ACS GAG GCC AAC GGG GGC CTG GTC GAG AAG ,.- 1392 
Asp Trp Leu Ser Val Glu Thr Glu Ala Asn Gly Gly Leu Val Glu Lys 
4790 • 4795 _ ^ 4800 

GAG GTG CTC- GOC GOG AGC GAC GCA GCC CTT TTG CGC TTC CGG CJGC CTG - 1440 
Glu Val Leu Ala Ala- Ser Asp Ala Ala Leu leu Arg Phe Arg Arg Leu 
4805 4810 4815 

CTG GTG GCT- GAG CTG CAG OCT GGC TTC TTT^GAC-AAG CAC ATC TGG CTC '/'' 1488 
2u Val Ala Glu Leu Gin Arg Gly Phe Phe-Aspiys His lie Trp Leu 
4820 4825 4830 4835 
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TOG ATA TOG GAC CGG..COG OCT CGT AGC ,CGT..TTC ACT OGC ATC CAG AGG 1536 
Ser lie Trp Asp/Arg Pro. Pro Arg Ser Arg Phe Thr Arg He Gin 'Arg 
4840 4845 * 4850 

GOC ADC TGC TGC GIT CTC CIC ATC TGC. CIC TIC CIG OGC GCC AAC GCC . 1584 
Ala Thr Cys Cys Val Leu leu He Cys Leu Phe Leu Gly Ala Asn Ala 
• 4855 4860 V" 4865 " 

GTG TGG TAG GGG GCT GIT GGC GAC TCT GOC TAG AGC AOG GGG CAT GIG 1632 
Val Trp Tyr Gly Ala Val Gly Asp Ser Ala Tyr Ser Thr Gly His Val. 
4870 4875 4880 

TOC'AGG CIG AGC COG CIG AGC GIC GAC AGA GIC GCT GTT GGC -CIG GIG 1680 
Ser Arg Leu Ser Pro Leu Ser Val Asp Thr Val Ala Val Gly Leu Val 
4885 " 4890 4895 

TOC AGC GIG GTT GIC TAT C0C GIC TAC CIG GCC ATC CIT TTT CIC TIC 1728 
Ser Ser Val Val Val Tyr Pro Val Tyr Leu Ala lie Leu Phe -Leu Phe, 
4900 4905 4910 4915 

COG ATG TOC OGG AGC AAG GIG GCT GGG AGC GOG AGC O0C.ACA OCT GOC . 1776 
Arg Met Ser. Arg Ser Lys Val Ala Gly Ser Pro Ser Pro Thr Pro Ala - 
4920 .4925 n . 4930 

GGG CAG CAG GIG CIG GAC ATC GAC AGC TGDJCIG GAC TOG TOC GTG CIG : 1824 
Gly Gin Gin Val Leu Asp He Asp. Ser Cys; Leu Asp Ser Ser Val Leu 
4935 . 4940 0c;/ 4945 <.: 

GAC AGC TOC "TIC CIC ACG TTC TCA GGC CIC CAC GCT. GAG GOC TTT GIT 1872 
Asp Ser Ser Phe Leu Thr. Phe Ser Gly LeuIHis Ala Glu-Ala Phe Val; • ' 
4950 4955 4960 

GGA CAG ATG AAG ACT GAC TTG TTT CIG GAT "GAT TCT AAG ACT CIG GIG'" ' 1920 
Gly Gin Met' Lys Ser Asp Leu Phe Leu Asp Asp -Ser Lys Ser Leu Val 
4965 4970 4975 

TGC TGG O0C TOC GGC GAG GGA AOG CIC' AGT TGG COG GAC CIG CIC ACT " 1968 
Cys Trp Pro' Ser Gly Glu Gly Thr Leu Ser Trp Pro Asp Leu Leu Ser 
4980 ■ 4985 4990 4995 

GAC COG TOC ATT GIG GGT AGC AAT CTG OGG CAG CIG GCA OGG GGC CAG 2016 
Asp Pro Ser He Val Gly Ser Asn Leu- Arg Gin Leu Ala Arg Gly Gin 
5000 " '5005 5010 

GOG GGC CAT GGG CIG GGC OCA GAG GAG GAC GGC TIC* TOC. CIG GOC AGC 2064 
Ala Gly His Gly Leu Gly Pro Glu Glu Asp Gly Phe Ser Leu Ala Ser 
5015 - 5020 5025 

00C TAC TOG OCT GOC AAA TOC TTC TCA' GCA TCA GAT GAA GAC CIG ATC * 2112 
Pro Tyr Ser Pro Ala Lys Ser : Phe Ser Ala" Ser* Asp Glu Asp Lai lie 
5030 5035 5040 

CAG CAG GIC CIT GCC GAG GGG GIC. AGC AGC, OCA GCC . OCT ACC CAA GAC 2160 
Gin Gin Val Leu Ala Glu Gly Val Ser Ser Pro Ala Pro Thr Gin Asp 
5045 5050* 5055 

2A0C CAC ATG , GAA AOG GAC CTG CIC AGC* AGC CIG-'TOC AGC ACT OCT GGG 2208 
Thr His Met . Glu Thr Asp Leu Leu Ser Ser Leu Ser Ser Thr Pro Gly 
5060 5065 5070 * 5075 
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GAG AAG ACA GAG AOG CTC GOG- CIG "CAG AGG CTG GGG* GAG CTC . GOG OCA * 2256 
Glu Lys Thr Glu Thr Leu Ala Leu Gin Arg leu' Gly Glu Leu Gly Pro 
5080 5085 '5090 

8C0C AGO OCA GGC CTG AAC TOG GAA CAG COC CAG GCA GOG AGG'cTG TOC . 2304 
Pro Ser Pro Gly Leu Asn Trp Glu Gin Pro Gin Ala Ala Arg Leu Ser '- 
5095 5100' " 5105 

AGG ACA GGA CIG GIG GAG GOT CTC" CGG AAG' OGC CTG CTG .CCG GOC. TOG 2352 
Arg Thr Gly Leu Val Glu Gly Leu Afg*Lys Arg Leu Leu Pro "Ala Trp - 
5110 5115 5120 1 

TCT GOC TOC CTG GGC CAC GGG *CIC ;AGC CTG CTC CTG' : Gro^QCr :; GIG GCT ■ 2400 
Cys Ala Ser Leu Ala His Gly Leu Ser' Leu Leu Lai Val Ala Val Ala""* " " 
5125 5130 5135 

GIG GCT GTC -TCA GGG TOG GTC GCT GOG AGC TTC OOC COG GGC GIG ACT """-2448* * 

Val Ala Val Ser Gly Trp Val Gly Ala Ser.Pte Pro Pro Gly Val Ser- 

5140 5145 5150 5155 

GTT GOG TOG CTC ' CTC TOC ; AGC AGC GOC AGC TTC CTG GOC TCA TIC CTC v: 2496 
Val Ala Trp Leu Leu Ser Ser Ser Ala Ser' Phe Leu Ala Ser Pte Leu -* 
5160 ~ 5165 5170 



GGC TOG GAG OCA CTG AAG GIG TTC' GIG-GAA GOC CTG TAC TTC TCA CIG - 
Gly Trp Glu Pro Leu Lys Val Leu Leu ; Glu Ala Leu^Tyr Pte Ser -"leu < . . .. --a, *j 
5175 " 5180 - - 5185 f. i 

GTC GOC AAG CGG. CTG CAC COG GAT4GAA.GAT GAC ACC CTG GTA -GAG AGC ' TX 2592 JM' 
Val Ala Lys Arg Leu His Pro Asp Glu:. Asp Asp Thr Leu Val r Glu Ser ™ 7 . z Ji *■■ 
5190 5195 5200 

COG GCT GIG AOG OCT GIG. AGC GCA OCT/ GIG OOC OGC GTA OGGtCCA GOC 2640 
Pro Ala Val. Thr Pro Val Ser Ala Arg Val Pre Arg Val^Arg Pro Pro - 
5205 5210 5215 * 

CAC GGC TIT- GCA CTC TTC CIG GOC AAG GAA GAA GOC t CGC AAG GIG AAG v 2688" 
His Gly Phe Ala Leu Phe Leu Ala Lys Glu. Glu Ala Arg Lys Val Lys- 
5220 . 5225 . 5230 5235 % 

AGG CPA CAT GGC ATC CTC CGG AGC CTC -CTC GIG TAG ATC. CTT TTT CIG 2736 
Arg Leu His Gly Met Leu , Arg Ser Leu"*Leu*Vai .Tyr Met Leu -Phe Leu 
/ 5240 * 5245 - 5250 

CTG GIG ACC CTG CIG GOC AGC TAT GGG. GAT GOG -TCA -TOC CAT GGG CAG 2784 
Leu Val Thr 'Leu Leii Ala Ser. .Tyr Gly Asp Ala Ser Cys -His- "Gly- His . 
5255 '"5260 ,./■/' ' 5265 y " 

GOC TAC OCT CIG CAA AGC. GOC ATC -AAG CAG GAG CIG CAG AGC GGG GOC - 2832 
Ala Tyr Arg Leu. Gin Ser Ala lie Lys Gin.. Glu. Leu His! Ser ,Arq Ala ' *' 
5270 " . 5275 * 5280 

TTC CTG GOC ATC AOG OGG..TCT. GAG ..GAG CTC TOG CCA TOG ATC GOC CAC ~ 2880 
Phe teu Ala lie Thr Arg^Ser Glu Glu Leu Trp Pro Trp Ntet Ala "His 
5285 ' -5290., : "~ 1 5295 -~ 

GIG'CTG CIG C0C TAC GTC CAC GGG AAC CAG. TOC AGC CCA GAG CIG GGG . 2928 
Val Leu Leu. Pro- Tyr Val His Gly Asa Glri Ser Ser .Pro' Glu L&TGly 
5300 - - 5305 - - - 5310 '5315" 

OOC OCA OQG CIG OGG CAG GIG CGG CIG CAG GAA GCA CTC TAC CCA GAC 2976 
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OOC OCA OGG CTG CGG CAG GTG COG CTC CAG GAA GCA CTC TAC OCA GAC 2976 
Pro Pro Arg Leu Arg Gin Val Arg Leu" Gin Glu Ala*. Leu Tyr Pro Asp 

5320 - -5325 5330 * • . . 

OCT OOC GGC OOC AGG GIC CAC AOG TGC TOG GOC GCA GGA GGC TTC AGO 3024 
Pro Pro Gly Pro Arg Val His Thr Cys Ser Ala Ala Gly Gly Phe Ser '. 
5335 5340 /" < 5345 

AOC AGC GAT TAC GAC GTT GGC TOG GAG ACT OCT CAC AAT GGC TOG GGG 3072 
Thr Ser Asp Tyr Asp- Val Gly Trp Glu Ser Pro His Asn Gly Ser Gly . . 
5350 5355 . . . 5360 

AOG TGG GOC TAT TCA GOG COG GAT CTG CTG GGG GCA TGG T0C TGG GGC 3120 
Thr . Trp Ala Tyr Ser Ala Pro Asp Leu Leudy Ala Trp Ser^ Trp Gly 
5365 5370 - ■ : 5375 . 

TCC TGT GCC GIG TAT GAC AGC GGG GOC TAC GIG CAG GAG CTG GGC CTG 3168 
Ser Cys Ala Val . Tyr Asp Ser. Gly Gly Tyr- Val Gin Glu Leu Gly* Leu 
5380 5385 5390 ; 5395 

AGC CTG GAG GAG AGC 0GC GAC OGG CTG CGC TTC CTG CAG CTG CAC AAC 3216 
Ser Leu Glu Glu Ser. Arg Asp Arg Leu. Arg Phe Leu Gin Leu His Asn * "" 
- 5400 .5405 ' • < 5410 

TGG CTG GAC AAC AGG AGC CGC GOT GTG TTC CTG GAG CTC AOG CGC TAC 3264 
Trp Leu Asp Asn Arg . Ser Arg Ala .Val, Phe' Leu Glu ~ Leu Thr Arg Tyr ' 
5415 . ,-5420 ,: '; , - 5425 ' 

AGC 00G GOC GIG GGG CTG CAC GOC GOC GTC AOG CIG OGC CTC GAG TTC 3312 
Ser Pro Ala Val' Gly Leu His Ala Ala ■ Val :*Thr Leu Arg Leu Glu^Fhe- ^ 
5430 ' . : ■ . 5435 ' 5440 . ' 

COG GOG GOC GGC OGC GOC CTG GOC GOC CTC AGC GTC OGC OOC TTT GOG 3360 
Pro Ala Ala Gly Arg Ala Leu Ala . Ala Leu "Ser Val Arg Pro Phe Ala 

5445 - 5450 . \ " 5455 " t. 

CTG OGC CGC CTC AGC GOG GGC CTC TOG CTG OCT CTG CTC ACC TOG GTG 3408 
Leu Arg Arg Leu Ser Ala Gly Leu Ser Leu Pro Leu - Leu Thr Ser - ' Val 
5460 5465 5470 5475 

TGC CTG CTG CTG TTC GOC GIG CAC TTC GOC GIG GOC GAG GOC OCT ACT 3456 
Cys Leu Leu Leu Phe Ala Val His Phe Ala Val . Ala Glu Ala. Arg Thr 
.5480 5485 5490 

TGG CAC AGG GAA GGG OGC TGG OGC GIG CIG OGG CTC GGA GOC TGG GOG 3504 
Trp His Arg Glu Gly Arg Trp Arg Val Leu Arg Leu Gly' Ala Trp Ala 
5495 . ' 5500 • 5505 

OGG TGG CTG CTG GIG GOG CTG AOG GOG GOC AOG GCA CTG GTA OGC CTC 3552 
Arg. Trp Leu Leu* Val Ala .Leu Thr Ala Ala Thr Ala - Leu Val Arg Leu 

5510 , 5515 ' . : 5520 ' • * 1~ ' 

GOC CAG CTG GGT GOC GCT GAC OGC CAG TGG AOC OGT TTC GIG OGC GGC 3600 
Ala . Gin Leu Gly Ala Ala Asp Arg Gin Tip' Thr- Arg Phe Val ■ Arg Gly 
5525 ' 5530 . 5535 - 

OGC COG CGC OGC TTC ACT AGC TTC GAC CAG GIG GOG CAC GIG AGC TOC 3648 
Arg Pro Arg Arg .Hie -Thr Ser Phe- Asp "Gin Val Ala" His Val Ser Ser 
5540 • .5545 - ■ : 5550 v ' ~ -■ 5555 
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GCA GOC OCT GGC CTG GOG GOO TOG. CTG CTC TTC CTG CTT TIG GTC AAG. . 3696 
Ala Ala Arg Gly Leu. Ala Ala Ser Leu lau'Fhe Leu Leu Leu Val Lys 
5560 5565 5570 

2GCT GOC CAG CAC GTA OGC TIC GTC OGG CAG TGG TOO GIC TIT GGC AAG ■ 3744 
Ala Ala Gin His Val Arg" Phe Val Arg Gin Trp Ser Val Pte Gly Lys 
5575 5580 - 5585 

ACA TTA TGC OGA GET CTG OCA GAG CTC, CTG GGG' GTC. AOC TTG~GGC CTG 3792 
Thr Leu Cys Arg Ala Leu Pro-Glu Leu Leu Gly Val : Thr Leu Gly Leu " . 
5590 5595 5600 

CTG GTC CTC GGG GTA GOC TAC GOC CAG CTG "GOC ATC CTG CTC ' GTC .TOT .v\-. 3840 
Val Val Leu Gly Val Ala Tyr Ala Gin Leu Ala He Leu. Leu Val Ser 
5605 5610 5615 

TOO TCT GIG GAC.TOC .CTC TQG ; AGO GTC GOC CAG. GOC CTG ;TTG' GTC CTG \ ' 3888.. 
Ser Cys Val Asp Ser Leu Trp Ser Val. Ala Gin Ala Leu Leu Val Leu 
5620 5625 5630 - 5635 

TGC OCT GGG ACT GGG CTC TCT AOC CTG'-TCT OCT GOC GAG TOC'TGG CAC ' 3936 - 
Cys Pro Gly Thr Gly Leu Ser Thr Leu Cys Pro Ala Glu Ser Trp His 
5640 5645 5650 

CTG TCA GOC CTG. CTG TCT » GTC ! GGG CTC /TGG* GCA CTC OGG CTG TGG GGC 3984 /' 

Leu Ser Pro Leu Leu Cys Val Gly Leu Trp Ala "Leu Arg Leu Trp Gly 
5655 5660 5665 

GOC CTA OGG CTG GGG GCT GIT ATT CIC OGC TGG OGC-* TAC 1 CAC -GOC TTG* * 4032' 
Ala Leu Arg Leu Gly Ala Val '.lie Leu Arg Trp Arg. Tyr His Ala Leu ' . ~ 
5670 5675 5680 

OCT GGA GAG CTG TAC-CGG COG GOC TGG GAG 00C CAG GAC TAC GAG ATG? . : 4080 
Arg Gly Glu Leu Tyr Arg Pro Ala -Trp Glu Pro Gin Asp Tyr Glu Met 
5685 5690 5695 

GIG GAG TTG TTC CTG OGC .AGG CTG 'CGC CTC TGG ATG GGC : CTC AGO AAG. , 4128 
Val Glu Leu Phe Leu Arg Arg Leu Arg Leu Trp Met Gly Leu Ser Lys 
5700 5705 5710 5715 

GTC AAG GAG TTC OGG. CAC AAA GTC CGC TIT GAA GGG ATG GAG OOG CTG 4176 
Val Lys Glu Phe Arg His Lys Val Arg Pte Glu Gly Met Glu Pro Leu 
5720 5725 5730 

OOC TCT CGC TQC TOO AGG GGC TOC AAG CTA TOC OOG GAT .GTG CCC'.'CCA": rx • 4224-: 
Pro Ser Arg Ser Ser Arg Gly Ser Lys Val Ser Pro Asp Val Pro Pro 
5735 5740 5745 

OOC AGO GCT GX^TOC/GAT GCC'iTCX? OC COC TCC AOC .TOC "TOC AGO CAG 4272 * 

Pro Ser Ala Gly Ser Asp Ala .Ser His Pro Ser Thr. : Ser Ser Ser Gin 
5750 5755 5760 

CTG GAT GGG CIG AGO GTC AGC CTG GGG OGG. .CTG GGG ACA AGG TCT GAG . v 4320 
Leu Asp Gly Leu Ser Val Ser Leu Gly Arg Leu Gly Thr Arg Cys Glu 
5765 5770 5775 

OCT GAG OOC TOC OGC .CTC CAA- GOC GTC TTC GAG ^ GOC CTG CTC *AOC CAG ' 4368 
Pro Glu Pro Ser Arg Leu Gin Ala Val Phe Glu Ala Leu Leu Thr Gin 
5780 5785 5790 5795 
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TTT GAC CGA CTC AAC CAG QCC ACA GAG GAC GTC TAG CAG CTG GAG GAG 4416. 
Phe Asp Arg Leu Asn.Gln Ala Thr Glu Asp Val Tyr Gin Leu Glu Gin 

5800 t< 5805 5810 . . . 

CAG CTG CAC AGC CTG CAA GGC OGC AGG AGC AGC OGG GOG OOC GOC GGA . 4464 
Gin Lai His Ser Leu Gin Gly Arg Arg Ser Ser Arg Ala Pro Ala Gly 
5815 5820 5B25 

TCT TOC OGT GGC OCA T0C COG GGC CTG CGG CCA GCA CTG OOC AGC OGC 4512 
Ser Ser Arg Gly Pro Ser Pro Gly Leu Arg Pro Ala Leu Pro Ser Arg 
5830 5835 5840 

CTT GOC CGG GCC ACT OGG GCT CTG GAC CTG GOC ACT GGC CCC AGC AGG . 4560 
Leu Ala Arg Ala Ser Arg Gly Val Asp Leu Ala Thr Gly Pro Ser tog 
5845 5850 5855 

ACA OCT TOG GGC CAA GAA CAA GGT CCA OOC CAG CAG CAC TTA CTC CTC ? 4608 
Thr Pro Ser Gly Gin Glu Gin Gly Pro Pro Gin Gin . His Leu Val Leu 
5860 5865 ' 5870 * 5875 " 

CTT OCT GGC GGG GGT GGG 00G TGG ACT OGG ACT GGA CAC OGC TCA GTA 4656... 
Leu Pro Gly Gly Gly Gly Pro Trp Ser Arg Ser Gly His Arg Ser Val 
5880 5885 5890 

TTA CTT TCT GOC GOT GTC AAG GOC GAG GGC CAG GCA GAA TGG CTG CAC 4704 
Leu Leu Ser Ala Ala Val Lys Ala Glu Gly Gin Ala Glu Trp Leu His 

5895 5900 , A 5905 ...... 

CTA OCT TCC OCA GAG AGC AGG CAG GGG CAT CTG TCT CTC TCT GGG CTT 4752 
Val Gly Ser Pro Glu Ser . Arg Gin >Gly His Leu Ser Val Cys Gly Leu 
5910 * 5915 5920 

CAG CAC TTT AAA. GAG, OCT GIG TGG OCA ACE AGG AOC CAG GGT 00C CTC . / 4800 
Gin His Phe Lys Glu Ala Val Trp Pro Thr Arg Thr Gin Gly Pro Leu 
5925 5930 5935 

OOC AGC T0C CTT GGG AAG GAC ACA GCA CTA TIG GAC GGT TIC " ■ 4842 

Pro Ser Ser Leu Gly Lys Asp Thr Ala Val Leu Asp Gly Phe 
5940 5945 5950 . . 

TAGOCTCTGA GATGCTAATT TATTTCC00G ACTOCTCAGG TACAGCGGGC TCTG000GGC 4902 

C0CAOO00CT GGGCAGATGT CCCCCACTGC TAAGGCTGCT GGCTTCAGGG AGGGITAGOC 4962 
2TGCAO0G00G OCAOOCTGCC CCTAAGTTAT TAOCTCTCCA CTICCTACOG TACTOOCTGC 5022 

ACOCTCTCAC TGTGTGTCTC GTGTCAGTAA TTTATATGCT GTTAAAATCT CTATATTITT 5082 

CTATCTCACT ATTTTCACTA GGGCTGAGGG G0CTG0G0CC AGAGCTGGOC TOCCOGAACA 5142 

OCTGCTGOGC TTGCTAGGIG -TGGTGQOGTT ATGGCAGCOC GGCTGCTGCT .TGGATGOGAG 5202 

LTlUJULTlly GQOOGGTOCT GGGC3GCACAG CTGTCTGOCA GGCACTCTCA TCAOOOCAGA * 5262 

GGOCTTCTCA TCCTOOCTTG OOOCAGCSliA GCTAGCAAGA GW3CAGCG0C CAGGOCTGCT 5322" 

G3CATCAGGT CTGGGCAACT AGCAGGACTA GGCATGTCAG AGGA000CAG GGTGGTTAGA 5382 

GGAAAAGACT OCTOCTQGGG GCTGGCTOOC AGGGTGGAGG AAGCTGACTG TCTGTGICTG 5442" 

TUlUlOiaOG OGOGACGOQC GAGTCTGCTG TATGGCOCAG GCAGOCTCAA GG00CT0GGA 5502 
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GCTQGCTGTC U UlUL'^iC'llj, TCTJQCCACTT CTGTGGGCAT GGCOCTICT AGAQOCTOGA ' 5562"" 

CACmXOCA ACOOCCGCAC CAAGCAGACA AAGrrCAATAA AAGAGCICTC TGACTGCAAA 5622 

AAAAAAAAA .... 56 31 



*- : (xi) SEQUENCE DESCRIPTION: SBQ" ID NO: 4: (Conpare Figure" 7 j 

Leu Asn Glu Glu Pro Leu Thr Leu Ala Gly Glu Glu lie Val Ala Gin 

1 , . -.5 10 15 

Gly Lys Arg Ser Asp Pro Arg Ser iJeu-Leu Cys Tyr Gly Gly Ala Pro 
20 25' 30 

Gly Pro Gly CVs His Phe Ser Ile'Prb' Glu. Ala Phe. Ser Giy Ala Leu - 

35 / - ■ » 40 J .'„-• 45 • — . 1 ' 

Ala Asn Leu Ser Asp Val Val Gin Leu lie Phe Leu Val Asp Ser Asn 

• * 50 J 55 • * - ; 60 

Pro Phe Pro Phe Gly Tyr lie Ser Asn Tyr Tftr Val Ser Thr Lys Val 
65 70 75 80 

Ala Ser Met Ala Phe Gin Thr Gliv^Ala'Giy^Ala Gin lie Pro' lie Glu V 
85 90 95 u; 

Arg : '*Leu Ala Ser Glu Arg Ala He* Thr Val Lys, Val- Pro Asn Asn* Ser 
100 . 105 . 110" 

Asp'Trp Ala Ala Arg Gly His Arg Ser Ser Ala Asn Ser Ala Asn Ser 

115 1 ,120 < : - - 125 ■ ; * 

Val Val Val Gin Pro Gin Ala Ser Val ~ Gly Ala Val Val Thr Leu Asp 

130 „ 135 . 140 . . . ■ . V. 

Ser Ser Asn Pro Ala Ala Gly Leu His Leu Gin Leu Asn Tyr Thr Leu 

145 • * v - 150 * ' v 155 * " • 160 * 

Leu Asp Gly- His Tyr Leu Ser Glu Glu Pro' 'Glu Pro Tyr Leu Ala Val 
165 v . . 170 , 175 

Tyr Leu His Ser Glu Pro Arg Pro Asn Glu His Asn Cys Ser Ala Ser 

"I 180 • - -^185^^ : ' ' 190 ' • 

Arg Arg He Arg Pro Glu Ser Leu Gin Gly- Ala Asp His Arg Pro Tyr 
195 . 200 205 r 

Thr Phe Pte He Ser Pro Gly Ser Arg Asp Pro Ala Gly Ser Tyr His 

210 - - 215 ' 1 : -220 * % - - * 

Leu Asn Leu Ser '-Ser His Phe Arg Trp Ser Ala lieu Gin Val Ser Val- 
225 230 . 235 1 240 

Gly Leu Tyr Thr Ser Leu Cys Gin Tyr Phe Ser Glu Glu Asp Met Val 
— 245" ' 250 ' 255' 
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Trp Arg Thr Glu Gly -Leu Leu Pro. Leu" : Glu Glu Thr Ser Pro Arg Gin 
260 1 265 270 

Ala VaT Cys Leu Thr "Arg His Leu Thr Ala Phe Gly Ala Ser Leu Phe 
275 280 285 

Val Pro Pro Ser His Val Arg 'Pte Val- Phe Pro- Glu Pro Thr Ala Asp 1 
290 295 300 * 

Val Asn Tyr lie Val Met Leu Thr Cys Ala Val Cys -Leu Val Thr Tyr- 
305 310 " 315 320 

Met Val Met Ala Ala lie Leu His Lys Leu Asp Gin. Leu Asp Ala, Ser 
325 330 335 

Arg Gly Arg. Ala lie Pro Phe Cys Gly -Gin Arg Gly Arg Phe Lys Tyr 
340 345 -350 

Glu' He Leu Val Lys Thr Gly Trp Gly.vAxg Gly Ser Gly Thr Thr Ala 
355 360 365 

His Val Gly He Met Leu Tyr Gly Val**Asp Ser Arg Ser Gly His Arg' 
370 375 380 

His Leu Asp Gly Asp Arg Ala Phe* His Arg Asn- Ser Leu Asp lie Phe 
385 390 395 400 

Arg lie Ala Thr Pro' His Ser : teu Gly Ser Val Trp" Lys lie Arg Val 
405 *■- 410 415 

Trp His Asp; Asn Lys Gly Leu'SSsf? Pro Ala Trp Phe Leu Gin His Val 
420 - : 425 430 

lie Val Arg Asp Leu Gin Thr Ala Arg Ser Ala Phe Phe Leu Val Asn' 
435 440 445 

Asp Trp Leu' Ser Val Glu- Thr Glu Ala Asn Gly Gly Leu Val Glu Lys 
450 '* 455 460 

Glu Val Leu Ala Ala Ser Asp Ala Ala Leu Leu Arg Phe Arg Arg Leu 
465 470 475 480 

Leu Val Ala Glu Leu Gin Arg Gly Phe* Phe As£> Lys His lie Trp Leu 
485 490 495 

Ser "lie Trp Asp Arg Pro Pro Arg Ser Arg Phe Thr Arg lie Gin Arg 
500 505 510 

Ala Thr Cys Cys Val Leu Leu lie Cys Leu Phe Leu Gly Ala Asn Ala 
515 520 525 ' - 

Val Tip Tyr Gly Ala Val ' Gly J Asp Ser Ala' Tyr ; Ser Thr Gly His Val 
530 - 535 . ' 540 

Ser Arg 'Leu Ser Pro Leu Ser Val Asp Thr Val Ala Val Gly Leu Val 
545 550 - 555 ■ 560 

Ser Ser Val Val Val Tyr Pro; Val Tyr Leu Ala He. Leu Phe Leu Phe 
565 T 570- - 575 



SUBSTITUTE SHEET (RULE 26) 



• . \ 53/77 

Arg Met Ser Arg Ser Lys Val Ala Gly Ser Pro Ser Pro Thr Pro Ala 

580 ' * ' ; 585 - * ' 590 

Gly Gin Gin Val Leu Asp. lie Asp Ser Cys Leu Asp Ser Ser Val Leu 

595 sr ' 600 ' c 605 

Asp Ser Ser Phe Leu Thr- Phe Ser- Gly Leu His Ala Glu Ala Phe^Val 
* 610 ^615 " ' " . , 620 

Gly Gin Met Lys Ser Asp Leu Phe Leu. Asp. Asp Ser r Lys Ser Leu .Val 
62S/. ' 630- 635 \ "'640' 

Cys Trp Pro Ser Gly Glu -Gly Thr .Leu, Ser Trp Pro Asp Leu Leu Ser 
'645 650 655 * 

Asp Pro Ser. He Val Gly Ser Asn Leu Arg Gin Leu Ala Arg Gly Gin 

660 ' . 665 * " ^ 670 

Ala Gly His Gly Leu Gly Pro Glu . Glu Asp Gly Phe Ser Leu Ala Ser 

675 ' . . 680 ' 685 " ' 

Pro Tyr- Ser Pro Ala Lys Ser Phe Ser - Ala Ser Asp Glu Asp Leu He 

690 , • 695 , 700 

Gin Gin Val Leu Ala Glu Gly .Val. Ser Ser- Pro Ala Pro Thr Gin Asp 
705 ■. 710/ " : ; ~ 720 - 

Thr His Met Glu. Thr Asp Leu, Leu Ser . Ser Leu Ser Ser. Thr Pro. Gly 
, 725 "Y "730 ' ' ' \ ' '735 

Glu Lys Thr Glu Thr Leu Ala Leu, Gin Arg Leu Gly Glu Leu .Gly. Pro 
740 * r ,745 " : "* * ,750' 

Pro Ser Pro Gly Leu Asn Trp. Glu Gin Pro Gin Ala Al$ Arg Leu Ser 

755 " 760 , ; " 765 

Arg Thr Gly Lai Val Glu Gly Leu Arg. Lys Atg Leu Leu Pro Ala ,Trp 
770 sY. 775 * 780 \ t 

Cys Ala Ser .Leu Ala His Gly Leu Ser Leu Leu Leu Val. Ala Val Ala 
785 790 : 795 800 

Val Ala Val Ser Gly Trp Val v Gly Ala. Ser Phe. Pro Pro, Gly Val Ser 
* ' 805 ■* ; 810 815 

Val Ala Trp Leu Leu Ser Ser Ser Ala Ser Phe Leu Ala Ser Phe Leu 

820 82S ;.830 ' " 

Gly Trp Glu Pro Leu. Lys Val Leu Leu Glu Ala Leu Tyr Phe Ser Leu 

835 * 840 " * ' 845 * - 

Val " Ala Lys Arg Leu His Pro Asp Glu Asp Asp Thr Leu Val Glu .Ser 
850 ' "*855 : 860 " " 

Pro Ala Val Thr Pro Val Ser Ala Arg Val Pro Arg .Val Arg Pro Pro 
865- " fc 870 875 * ** 880 

His Gly Phe Ala Leu Phe Leu. Ala Lys Glu Glu Ala Arg Lys Val Lys 
885* 890 895 
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Arg Leu His Gly Met Leu Arg Ser Leu Leu Val Tyr Met Leu Phe Leu 

900 , * " 905 r " 910, 

Leu Val Thr Leu Leu Ala Ser Tyr Gly Asp Ala Ser. Cys His Gly His 
915 920 . 925 " 

Ala Tyr Arg Leu Gin Ser Ala He Lys Gin Glu Leu His. Ser Arg Ala 
930 - 935 94Q 

Phe Leu Ala lie Thr Arg Ser Glu Glu Leu Trp Pro Trp Met . Ala His 
945 950 *' ' 955 ■ ' 960 

Val Leu Leu Pro Tyr Val His Gly Asn Gin Ser Ser Pro Glu. Leu Gly 
. 965 " 970 975 

Pro Pro Arg Leu Arg Gin Val Arg Leu Gin Glu Ala Leu Tyr Pro Asp 
980 985 990 

Pro Pro Gly Pro Arg Val His Thr Cys Ser- Ala Ala ~ Gly Gly Phe Ser .. 
995 1000 ^ ' 1005 

Thr Ser Asp Tyr Asp Val Gly Trp Glu Ser Pro His Asn Gly Ser Gly' 
1010 ~ ,1015 1020 

Thr Trp Ala Tyr Ser Ala Pro Asp Leu Leu* Gly Ala Trp Ser Trp Gly. 
1025 / 1030 ;/ " 1035 " * 1040 1 

Ser Cys Ala Val Tyr Asp Ser Gly Gly Tyr Val Gin Glu Leu Gly. Leu . 

1045 1050 ' , • 1055 - 

2 

Ser Leu Glu Glu Ser. Arg Asp Arg Leu Arg Phe Leu Gin Leu His Asn, 
1060 . • 1065 " " ^ " 1070. 

Trp Leu Asp Asn Arg Ser Arg Ala Vol Phe Leu Glu Leu. Thr Arg Tyr 
1075 ' " * ' . 1080 ' J - : ■ 1085 ' 

Ser Pro Ala Val Gly Leu His Ala Ala. Val Thr Leu Arg Leu Glu Phe 
1090 1095 1100 

Pro Ala Ala Gly Arg Ala Leu Ala Ala Leu Ser Val Arg Pro Phe Ala . 
1105 1110 ' " 1115 1120 

Leu Arg Arg Leu Ser Ala Gly Leu Ser Leu Pro Leu Leu Thr Ser Val 
1125 " . * - 1130 ' \ 1135 

Cys Leu Leu Leu Phe Ala Val His Pte Ala Val Ala Glu Ala Arg Thr 
1140 1145 • 1150 . 

Trp His Arg Glu Gly Arg Trp Arg Val Leu Arg Leu Gly Ala Trp Ala 
1155 1160 1165 

Arg Trp Leu Leu Val Ala Leu Thr Ala Ala Thr Ala Leu Val Arg Leu 
1170 1175 " ' 1180 " r 

8 

Ala Gin Lai Gly Ala Ala Asp Arg Gin Trp Thr Arg Phe Val Arg Gly 
1185 1190 1 1195 1200 

Arg Pro Arg Arg Phe Thr Ser Phe Asp Gin Val Ala His Val Ser Ser 
2 1205 1210 1215 
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Ala Ala Arg Gly Leu Ala Ala Ser Leu Leu Phe lieu Leu Leu Val Lys 
1220 1225 1230' - 

Ala Ala Gin His Val Arg Phe vkl Arg' Gin* Trp Ser Val Phe Gly Lys 
1235 1240 1245 

Thr Leu Cys Arg Ala Leii* Pro; Glu Leti Leiu : Gly Val Thr Leii Gly Leu f .' 
1250 1255 1260 

Val Val Leu Gly Val Ala Tyr Ala* Glri Leu Ala' lie Leu Leu Val Ser • 
1265 1270 - 1275 1280 

Ser Cys Val Asp Ser Leu Trp Ser Val Ala Gin Ala Leu Leu Val Leu - 
' .1285 1290 1295 

Cys Pro Gly Thr Gly Leu Ser Thr Leu Cys Pro Ala Glu Ser Trp His 
1300 1305 1310 

Leu Ser Pro" Leu Leu Cys. Val Gly Leu Trp ; Ala leu Arg Lai Trp Gly 
1315 * .1320 1325 - 

Ala Leu Arg Leu" Gly Ala Val life Leu Arg Trp Arg Tyr His Ala Leu 
1330 1335 1340 

Arg Gly Glu Leu Tyr Arg Pro Ala Trp" Glu Pro Gin Asp Tyr Glu Met 
1345 1350 u 1355 1360 

Val Glu L£u Phe Leu Arg Arg leu Arg Lki Trp Met Gly Lai Ser Lys r ' 
' - 1365 • JC -- 1370 1375 

Val Lys Glu Phe Arg His Lys 1 Val' tog\ptik Glu Giy Met Glu Pro Leu 
1380 1385 - 1390 ' 

Pro Ser Arg Ser Ser Arg Gly Ser' Lys Val Ser Pro Asp Val Pro Pro - 
1395 1400 1405 '* J 

Pro Ser Ala Gly* Ser Asp Ala Ser His Pro Ser Thr Ser Ser Ser Gin . 
1410 1415 1420 

Leu Asp, Gly Leu Ser Val Ser Leu Gly Arg Leu Gly Thr Arg Cys Glu 
1425 1430 ~ • 1435 1440 

Pro Glu Pro Ser Arg Leu Gin Ala Val Phe Glu Ala Leu Leu Thr Gin 
1445 1450 c 1455 

Phe Asp Arg Leu Asn Gin Ala Thr Glu Asp Val Tyr Gin Lieu Glu Gin * 
1460 1465 ' 1470 

Gin Leu His Ser Lieu' .Glri' Gly Arg Arg Ser Ser Arg Ala Pro Ala Gly : - 
1475 - 1480 1485 

Ser Ser Arg Gly Pro Ser ^ Pro Gly Leu Arg Pro Ala Leu Pro Ser Arg 
1490 1495 1500 

Leu Ala Arg Ala Ser Arg Gly Val Asp Leu Ala Thr Gly Pro Ser Arg "■ 
1505 - 1510 1515 1520 

Thr Pro Ser Gly Gin Glu Gin Gly Pro Pro Gin Gin His Leu Val Leu * 
1525 1530 1535 



SUBSTITUTE SHEET (RULE 26) 



WO 95/34649 PCT/GB95/01386 



56/77 

Leu Pro Gly Gly Gly Gly Pro Trp Ser Aug Ser Gly His Arg Ser Val 
1540 1545 1550 

Leu Leu Ser Ala Ala Val Lys Ala Glu Gly Gin Ala Glu Trp Leu His 
1555 1560 v. 1565 . 

Val Gly Ser Pro Glu Ser Arg Gin Gly His. Leu Ser Val Cys Gly Leu 
1570 1575 % 1580 

Gin His Phe Lys Glu Ala Val Trp Pro Thr Arg Thr Gin Gly Pro Leu 
1585 1590 1595 . v 1600 

v* I 

Pro Ser Ser Leu Gly Lys Asp Thr Ala -Val Leu Asp Gly Phe' 
1605 1610 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: (Compare Figure 8) 

AGCITGGCAC CATCAAGGGC CAGTTCAACT TTCTOCADGT GATOCTCAOC COGCTGGACT 60 

; ; . J • 

AOGACTQCAA CCTGCTCTCC CTGCACTQCA GGAAAGACAT GGAGGQOCTT GTIGGACACCA . 120 

G0CTGGC3CAA G ATO GTCTCT GACOGCAACC IV^UULTIXXr T GGOOOGOCAG ATGGCCCTGC 180 

AOGCAAATAT GGOCTCACAG GTOCATCATA GOOGCTGGAA C00CAC0GAT ATCTAOOCCT ■ 240 

OCAAGTGGAT TGOOCGGCIC OGOCACATCA AGaZSCPOOG CCAGOQGATC TGOGAGGAAG 300 

CCTOCTACTC CAAOCOCAGC CTAOCICTGG TGGADC3CT0C CTCCCATAGC, AAAGOOCCIG <360 

OOGACTOC AG00GAGC0C ACACCTGGCT ATGAGCTGQG CCAGOGGAAG OGCCTCATCT . 420 

CCTOGGTOGA GGACITCAOC GAG Tl 'lUlGT GAGGOOGGGG OX'IUX'ICC TGCACTGGOC 480 

TTGGACGGTA TTGCCICTCA GTCAAATAAA TAAAGTCCTG ACOCCAGTGC ACAGACATAG 540 

AGGCACAGAT TGC 553 

'. . ! 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: (Ccqpare Figure 9) 

CTGCTGICTG TGAGACGTGC GGGGCTGGGA AGTOITOQCA GAGOOGOGAG TACOGICCTC 60 

ACTOCTTTTG TICTTTIGAC GTAAGCTGGC GAGTGQCACT GCCTGAGITC CGCTCAGTGC 120 

COGCCCTGAT GTGCGGACCC OGCTGCATTC TTOCT7ITAG GTGCTGGCGG: TGTOCQCTGT 180 

CACOGAGACT CTTTGGGAGC TTTGGGGAGG TTGTQCCAAG CCTGAGCCTC 240 

TTOCCGGCTT TCTGTTGGCT CTICTGAGGC CAGGGCATCT CTATGAQGGC 300 

CTCCTGCTGG AGCCGTCTCT CTGGATCTCC TCTGCCATOC TGGOOCATGA GTOQGTGATG 360 

CGCTGGCCAC CATCTGGTGA CAGTOGCOGG GCACOQCTGC CAAATGTGGG * TCOCGCATCT 420 

GCAAGCCOCT COCTGQGTOC CCTAGGGTAT GQGGnGGSTC TG0CACK3CC CIOQCT000C 480 

CADCTTGGGG TGOCTCTOOC CCTGCTOGIG GGGGAGA 517 
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1 GCACTGCAGCGCCAGCGTCCGAGCGGGCGGCCGAGCTCCCGGAGCGGCCTGGCCCCG AGC 6 0 

61 CCCGAGCGGGCGTCGCTCAQCAGCAGGTCGCGGCCGCGCAGCCCCATCCAGCCCCGCGCC 120 

121 CGCCATGCCGTCCGCGGGCCCCGCCT GAGCTGCGGTCTCCGCGCGCGGGCGGGCCTGGGG 180 

181 ACGGCGGGGCCATGCGCGCGCTGCCCTAAGG ATGCCGCCCGCCGCGCCCGCCCGCCTGGC 240 
1 --"I-- ',-m- PPAAPARLA 10 

241 GCTGGCCCTGGGCCTGGGCCTGTGGCTCGGGGGGCTGGCGGGGGGCCCCGGGCGCGGCTG 300 

11 L A L G L G L W~V'L G ' A L A G G P G R G C 30 

301 CGGGCCCTGCGAGCCCCCCTGCCT CTGCGGCCCAGCGCCCGGCGCCGCCTGCCGCGTCAA 360 

31 G P C E P P C C L C G.. P.. A P G A A C R V N 50 

361 CTGCTCGGGCCGCGGGCTGCGGACGCTCGGTCCCGCGCTGCGCATCCCCGCGGACGCCAC 420 

51 CSGRGLRTLG PALRI PADAT 70 

421 AGCGCTAGACGTCTCCCACAACCTGCTCCGGGCGCTGGACGTTGGGCTCCTGGCGAACCT 480 

71 ALDVSHNLLRALDVGLLANL 90 

481 CTCGGCGCTGGCAGAGCTGG ATATAAGCAACAAC AAGATTTCTACGTTAGAAG AAGGAAT 540 

91 SALAELDI SNNKISTLEEGI 110 

541 ATITGCTAATTTATTTAATTTAAGTGAAA^ 600 

111 FANLFN LSEI NLSGNPFECD 130 

601 CTGTGGCCTGGCGTGGCTGCCGCGATGGGCGGAGGAGCAGCAGGTGCGGGTGGTGCAGCC 660 

131 CGLAWL PRW A ^E E Q - Q" V R V V Q P 150 

661 CG AGGCAGCCACGTGTGCTCGGCCTGGCTCC^ 720 

151 .E A A T C A .G P G S Ii" A G'^' I? 'L G I P 170 

721 CTTGCTGGACAGTGGCTGT^TGAGGAGTATGTCGCC 780 

171 LLDSGCGEEYV-A'C'L-PDNSSG 190 

781 C ACCGTGGC AGC AGTGTCCTTTTCAGCTGCCCACGAAGGCCTGCTTCAGCCAG AGGCCTG 840 

191 TVAAVS F SAAHE G'lLQ PEAC 210 

841 CAGCGCCTTCTGCTTCTCCACCGGCCAGGGCCTCGCAGCCCTCTCGGAGCAGGGCTGGTG 900 

211 S A F '*C- F'.S t T "G'Q *G'*'X' A AiLSEQGWC 230 

901 CCTGTGTGGGGCGGCCCAGCCCTCCAGTGCCTCCTTTGCCTGCCTGTCCCTCTGCTCCGG 960 

231 L C G A: A Q .P. S SAS F ACLSLC SG 250 

961 CCCCCCGCC ACCTCCTGCCCCCAiCCTGTAGGGGCCCCACCCTCCTCCAGCACGTCTTCCC 1020 

251 PPPPP.A PTC .RGPTLLQHVFP 270 

1021 TGCCTCCCCAGGGGCCACCCTGGTGGGGCCCCACGGACCTCTGGCCTCTGGCCAGCTAGC 1080 

271 A S P G A T L : V G P H G P L A S G Q L A 290 

! 

1081 AGCCTTCCACATCGCTGCCCCGCTCCCTGTCACTGCCACACGCTGGGACTTCGGAGACGG 1140 

291 A F H I A A P Lj P \A ' T A T R W D F G D G 310 

* 

1141 CTCCGCCGAGGTGG ATGCCGCTGGGCCGGCTGCG-TCGCATCGCTATGTGCTGCCTGGGCG 1200 

311 SAEVDA'AG'PAASHRYVLPGR 330 

- Fig:15 
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1201 CTATCACGTGACGGCCGTGCTGGCCCTGGGGGeGGGCTCAGGCCTGCTGGGGACAGACGT 1260 

331 Y'HYTAVLALGAG SALL GTDV 350 

1261 GCAGGTGGAAGCGGCACCTGCCGCCCTGGAGCTCGTGTGCCCGTCCTCGGTGCAGAGTGA 1320. 

3 51 Q V EAAPAALE LiVC P S SVQSD 370 

1321 CGAGAGCCTTGACCTCAGCATCCAGAACCGCGGTGGTTCAGGCCTGGAGGCCGCCTACAG 1380 

371 ESLDLSIQNR GG SGLEAAYS 390 

13 81 CATCGTGGCCCTGGGCGAGGAGCCGGCCCGAGCGGTGCACCCGCTCTGCCCCTCGGACAC ' 1440. 

391; IVALGEEPARAV. HPLCPS DT : 4i0 

1441 GGAGATCTTCCCTGGCAACGGGCACTGCTACCGCCTGGTGGTGGAGAAGGCGGCCTGGCT .1500 

411 C E IFPGN GHCY. RLVVE KAAWL 430" 

1501 GCAGGCGCAGGAGCAGTGTCAGGCCTGGGCCGGGGCCGCCCTGGCAATGGTGGACAGTCC ' - 1560' 

431 QAQEQCQAWAGAALAMVDSP -450 

1561 CGCCGTGCAGCGCTTCCTGGTCTCCCGrGGTCACCAGGAGCCTAGACGTGTGGATCGGCTT 1620 

451. ' A V Q: R F L V S RVT R S L D V W I G F 470 

1621. CTCGACTGTGCAGGGGGTGGAGGTGGGCCCAGCGCCGGAdGGCGAGGCCTTCAGCCTGGA' 1680 

471 S T V Q G V E" V G P A- P- Q G E A F S L E 490 

1681 GAGCTGCCAGAACTGGCITCCCGGGGAGCC^^ 1740 

491 S C Q N W L P E P H ? P A T A E H C V R * 510 

1741 GCTCGGGCCC^CCXX^TGGTCTAACACCGACCTCTCCTCAGCG 1800 

511 L G P T G W C N T D L " C S A P H S Y V C 530 

i801 CGAGCTGCAGCCCGGAGGCCCAGTGCAGGATGC "I860 

531- -.E L Q PG GP VQ D A E' N'L L V G A P S 550 

1861 TGGGGACCTGCAGGGACCCCTGAC'GCCTCT^ * 1920 

551 G D LQG PL T P L AQ'Q DG L S A P H 570 

1921 CG AGCCCGTGGAGGTCATGGTATTCCCGGGCCTGCGTCTGAGCCGTGAAGCCTTCCTCAC 1980 

571 E P V E V M V* F" P G'L R L S R E A F. L T . .590 

1981 CACGGCCGAATTTCGGACCCAGGAGCTCCGGCGGCCCGCCCAGCTGCGGCTGCAGGTGTA 2040 

591 T A E F G TQ E L R R ' P ' A. Q L R L Q V Y 610 

2041 CCGGCTCCTCAGCACAGCAGGGACCCCGGAGAACGGCAGCGAGCCTGAGAGCAGGTCCCC 2100 

611 RLLSTAGTPENGSEPESRSP 630 
* ■ ■ * 

2101 GGACAACAGGACCCAGCTGGCCCCCGCGTGCATGCCAGGGGGACGCTGGTGCCCTGGAGC . 2160 

631 DNRTQL A PACMPGGRWC PGA 650 

* V * ■ ' ■ . • • ' 

2161'.. C AACATCTGCTTGCCGCTGGACGCCTCTTGCCACCCCCAGGCCTGCGCCAATGGCTGCAC 2220 

651 N I C L P LDA S C H P QAC AN G C T 670 

2221 GTCAGGGCCAGGGCTACCCGGGGCCCCCTATGCGCTATGGAGAGAGTTCCTCTTCTCCGT 2280 

671 SGPGLPGAPYALWREFLFSV 690 

2281 * TGCCGCGGGGCCCCCCGCGCAGTACTCGGTCACCCTCCACGGCCAGGATGTCCTCATGCT 2340 

691 AAGPPAQYSVTLHGQDVLML 710 

2341 * CCCTGGTGACCTCGTTGGCTTGCAGCACGACGCTGGCCCTGGCGCCCTCCTGCACTGCTC 2400 

711 P G D LVGL QH DAG PGAL L HC S 730 
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2401 GCCGGCTCCCGGCCACCCTGGTCCCCAGGCCCCGTACCTC 2460 

73* r PAP GH PGPQAPYL SANA S SW 750 

* 

2461 GCTGCCCCACtTGCCAGCCCAGCTGGAGGGCACT^ - 2520 ' 

751 LPHLPAQLEGTWACPACA LR 770 > - 

2521 - GCTGCTTGCAGCCACGGAACAGCTGACGGTGCTCCTGGGCTTC '2580 

771 L L A A T E Q C T V L L G L R P N P 0 L 790 

2581' GCGGATGCCTGGGCGCTATGAGGTCCGGGCAGAGGTC -2 6 4 0 - 

791. RMPGRYEVR AE VGN'GVSRHN 810 : 

* 

2641 CCTCTCCTGCAGCTTTGACGT(^TGTCCCCAGTGC^TG'GGCTC ' 2700 

811 L SCSFDVVSP V AGLRV IY PA - 830 

2701 CCCCCGCGACGGCCGCCTCTACGTGCCCACCAACGGCTCAGCCTTGGTGCTCCAGGTGGA" 2760 

831 .PRDGR LYVPTN GSA'LVLQVD 1 850- ; 

* 

2761 CTCTGGTGCC AACGCCACGGCCACGGGTCGCTGGCCTGGGGGGAGTGTCAGCGCCCGCTT L 282 0- * : 

851 . S G A N A T A TAR, W. P G G S V S A R F 870 
* 

2821 TGAGAATGTCTGCCCTGCCCrcGTGqCCACCTTCGTGCCCOGCTGCCCCTCGG * 2 B 8 0 ' 

871 . E N V C P A L V A> T- F V P G. C ' P' W E T N 890 ** : '- 

2881 CGATACCCTGTTCTCAGTGGTAGCACTGCGGTGGCTCAGTGAGGGGGAGCACGTGGTGGA 2940.-- . 

891 * D t L F S V V. A- L "p. W.. l/s E G E H V V* D , 910 ■-. 

2941 CGTGGTGGTGGAAAACAGCGCCAGCCGGGCCAACCTCAGCCTGCGGGTGACGGCGGAGGA - 3000 

911 ; V V "V- E N S A- S R A* N L, S L R V T A* E E 930 

3001 .GCCCATCTGTGGCCTCCGCGCCACGCCCAGCCGGGAGGCCCGTGTACTGCAGGGAGTCCT-^: 3060 , 
931, P f C G L R; A T* T pS"% p fe.. A R V L Q G V- L ^ ■ 950 , 

3061 AGTGAGGTACAGCCCCGTGGTGGAGGCCGGCTCGGACATGGTCTTCCGGTGGACCATCAA 3120 

951 .V R Y S P V V E 'A G S- D M V. F R W- T I" N ' 970 . 

3121 . CGACAAGCAGTCCCTGACCTTCCAGAACGTGGTCTTCAATGTCATTTATCAGAGCGCGGC- 3180 

971 D K Q. ' S L T F, Q N V V F N ' V I Y Q S A A ,990 

3181 GGTCTTCAAGCTCTCACTGACGGCCTCCAACCACGTGAGCAACGTCACCGTGAACTACAA ,3240 

991 V F K L S -h T A S N- H V "s N * V* T V N Y- N 1010 

.'**'>• 

3241 CGTAACCGTGGAGCGGATGAACAGGATGCAGGGTCTGCAGGTCTCCACAGTGCCGGCCGT 3300 

1011 " - V T V E R M Ni'R M Q G L Q • V ' S T V : P A * V 1030 

33 01 GCTGTCCCCCAATGCCACACTGGTACTGACGGGTGGTGTGCTGGTGGACTCAGCTGTGGA 3360 

1031 . L S PIN 'A T h V L' * T G : G V L -V D S A- V E 1050 

3361 GGTGGCCTTCCTGTGGAACTTTGGGGATGGGGAGCAGGCCCTCCACCAGTTCCAGCCTCe 3420 

1051 V A * F " L W N F- . G ' D G E ~Q' A L H Q - F Q -P- : P - " 1070 ; 

3421 GTACAACGAGTCCTTCCCGGTTCCAGACCCCTCGGTGGCCCAGGTGCTGGTGG AGC ACAA 3480 

1071 Y N ,E .S . P. V P D- P S~ V A Q V*~L V E" H N 1090 

* „ . ; 

3481 TGTCATGCACAGCTACGCTGCCCCAGGTGAGTACCTCCTGACCGTGCTGGCATCTAATGC 3540 

1091 V M H T Y A A P ! G . E ".. Y L V T V L'.A'IS N A" 1110. 

3541 CTTCGAG AACCTGACGCAGCAGGTGCCTGTGAGCGTGCGCGCCTCCCTGCCCTCCGTGGC 3600 
1111 F E'N L*T Q Q V P WS V R A S L. P S V/ A * . 1130 
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3601 TGTGGGTGTGAGTGACGGCGTCCTGGTGGCCGGCCGGCCCGTCACCTT.CTACCCGCACCC .3660 
il31 V G V- S D G V L V A G R P ' V T F Y. P H ' P 1150 



3661 
1151 



GCTGCCCTCGCCTGGGGGTGTTCTTTACACGTGGGACTTCGGGGACGGCTCCCCTGTCCT 
LPS PGGVLYTWDFGDGSPVL 



3720 
1170 



3721 GACCCAGAGCCAGCCGGCTGCCAACCACACCTATGCCTCGAGGGGCACCTACCACGTGCG -3780 

1171 T Q S Q P A A N tt * 'Y A S R "G T Y' H V R 1190 

3781 CCTGGAGGTCAACAACACGGTGAGCGGTGCGGCGGCCCAGGCGGATGTGCGCGTCTTTGA .3840 

1191 L- E V N N T V -S G ' A - Q- * A D V R V F E * 1210 

3841 GGAGCTCCGCGGACTCAGCGTGGACATGAGCCTGGCCGTGG AGCAGGGCGCCCCCGTGGT 3900 

1211 E L R G L S V- D M S *L "A V E Q G A P V V 1230 

3901 GGTCAGCGCCGCGGTGCAGACGGGCGACAACATCACGTGGACCTTCGACATGGGGGACGG 3960 

1231 \A S A~- A V Q T- G D N I* T W T F D M G D G 1250 

::. * 1 

3961 CACCGTGCTGTCGGGCCCGGAGGCAACAGTGGAGCATGTGTACCTGCGGGCACAGAACTG 4020 

1251 T V L S G P -E A T V E H ' V Y L R A Q N C 1270 

• - * - 

4021 CAC AGTGACCGTGGGTGCGGCCAGCCCCGCCGGCCACCTGGCCCGGAGCCTGCACGTGCT 4080 

1271 T V T V G A- ; 'A. S P A';* G H L A R S L H V L' 1290 

4081 GGTCTTCGTCCTGGAGGTGCTGCGCGTTGAACCCGCCGCCTGCATCCCCACGCAGCCTGA 4140 

1291 V F V L E V L -R V; E *P;- A* - A. C I *P T ' Q P D 1310 

4141 . CGCGCGGCTCACGGCCTACGTCACCGGGAACCCGGCCCACTACCTCTTCGACTGGACCTT 4200 

1311 ARLTA YV-T G 'N'/' P A/' H Y L F D W T T 1330 

4201 . CGGGGATGGCTCCTCCAACACGACCGTGCGGGG£TGCCCGACGGTGACACACAACTTCAC. 4260 

1331 G D G S S N -T" T V R-*G ' C" P *T V T H'N'T T. 1350 

4261 GCGGAGCGGCACGTTCCCGCTGGCGCTGGTGCTGTCCAGCCGCGTGAACAGGGCGC ATTA 4320 

1351 R S G * T F P L A L V L S~ S R V N R A H Y 1370 

4321 CTTCACCAGCATCTGCGTGGAGCCAGAGGTQGGCAACGTCAGCCTGCAGCCAGAGAGGCA . 4380 

1371 F • T I* C V E -P S V " G"~ N V T L Q* P E R ' Q 1390 

43 81 GTTTGTGCAGCTCGGGGACGAGGCCTGGCTGGTGGCATGTGCCTGGCCCCCGTTCCCCTA 4440 

1391 F V Q L G D E A W L V A. C A W P P ' F ? Y 1410 

4441 CCGCTACACCTGGGACTTTGGCACCGAGGAAGCCGCC.CCCACCCGTGCCAGGGGCCCTGA. ■ 4500 

1411 R Y' T W D F G : T E E A A'P T R A R G- P E 1430 

4501 GGTGACGTTOITCTACCGAGACCCAGGCTCCTATCTTGTGACAGTCACCGCGTCCAACAA . 4560 

1431 V T-F I Y R D P G - *S .Y L V T V T A S N N 1450 

4561 CATCTCTGCTGCCAATGACTCAGCCCTGGTGGAGGTGCAGGAGCCCGTGCTGGTCACCAG 4620 

1451 I 3- A 'A---N D - S A L.; E . *V" Q E ' P V L V T S 1470 

4621 CATCAAGGTCAATGGCTCCCTTGGGCTGGAGCTGCAGCAGCCGTACCTGTTCTCTGCTGT 4680 

1471 I K V N G S r ~ L G L"E L ~ Q Q P Y L *" F S A . V. 1490 

4681 GGGCCGTGGGCGCCCCGCCAGCTAGCTGTGGGATCTGGGGGACGGTGGGTGGCTCGAGGG 4740 

1491 GRGRPASYL W\ r D ' L : G D G G W''L E G* 1510 



4741 TCCGGAGGTCACCCACGCTTACAACAGCACAGGTGACTTC ACCGTTAGGGTGGCCGGCTG 4800 
1511 P E V T* H A Y N S T G D F T V R V A!' G W* 1530 

- * 
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4801 GAATGAGGTGAGCCGCAGCGAGGCCTGGGTCAATCTGACOSTGAAGCGGCGCGTGCGGGG 4860 

1531 NEVSRSEAWLNVTVKRRVRG 1550 

- * * 

4861 GCTCGTCGTCAATGCAAGCCGCACGGf GGTGCCCCTGAAT GGGAGCGTGAGCTTCAGCAC 4920 

1551 L V V N A S R T V V P L' N G S V S F S T 1570 

* .. . , . * 

4921 GTCGCTGGAGGCCGGCAGTdATGTGCGCTATTC 4980 

1571 SLEAGSDVR YSWV'LCDRCTP' 1590 

4981 ttTCCCTGOSGGTCCTACC AT^TTACACCTTCCGCTCCGTGGGCACC^CAATATCAT 5040 

1591 I P G G P T i SY t' F R SVG T FN I I 1610 

5041 CGTCACGGCTGAGAACGAGGTGGGCTCCGCC 5100 

' 1611 VTAENEVG&AQDS I FVYVLQ 1630 

5101 GCTCATAGAG<^GCTGGAGGTGGTG^CGGTGGCCGCTACTTCCCCACCAA 5160 

1631 LIEGLQVVGGG-RY F PTNHTV 1650- 

5161 ACAGCTGCAGGCCGTGGTTAGGGATCGCACCAACGTCTCCTACAGCT^ 5220 

1651 QL, QAVVRDG TNVSYSWT AWR 1670 

...... -....* - - 

5 2.2 1 GGACAGCXK^CCGGCCCTGGCCGGCAGCGGCAAAGGCTTCTCGCTCACCGTGCTCGAGGC 5280 

1671 DRG"PALAGSG'k"G*FSLTVLEA 1690 

5281 CGGCACCTACCATGTGCAGCTGCGGdCCACdAACATGCTG^ 5340 

1691 G T Y H V Q L R A T N M L G S A W A D C 1710 

5 3.4 1 CACCATGGACTTCGTGGAGCCTGTCX^ 5400 

1711 T M D F V E P V* G W" L M V T A S P N PA* 1730 

5401 TGCCGTCAACACAAGCGTCACCCTCAG^^ 5460 

1731 A V N T S V T L S^A'e'l A G G S G V - V Y 1750 

* , . . . ; / 

■5461 C ACTTGGTCCTTGGAGGAGGGGCTGAGCtGGGAGACCTCCGAGCCATTTA^ 5520 

1751 TWS LEEGIiSWETS'E'-PFTTH.S-- 1770 

5521 CTTCCCCACACCCGGCCTGCACTTGGTCACCAT^ 5580 

1771 F P T P G L H L V T M T A G N P L G S A 1790 

5581 CAACGCCACCGTGGAAGTGGATGTGCAGGTGCCTCTGAGTCGCCTCAGCATCAGGGCCAG 5640 

1791 NATVEVDVQ r VPVSG LSIRAS 1810 

* ' . . - 

5641 CGAGCCCGGAGGCAGCTTCGTGGCGGCCGGGTGCTCTGTCCGCTTTTGGGGGCAGCTGG 5 70 0 

1811 E P' G G S F V A A G S S V P'F W G Q L A* 1830 

5701 CACGGGCACCAATGTGAGCTGGTCCTGGGCTGTCCCCGGCGC^^ " 5 7 6,0 

1831 TG T NV S W C W* A" V P G G " S S K R G ? 1850 

. * .. - , . - . 

5761 TCATGTCACCATGGTCTTCCCGGATGCTGGCACC 5 8 20 

1851 HVTMV F PDAGTF S I RLNA'S N* 1870 

.. - - * 

5821 CGCAGTCAGCTGGGTCTCAGCCACGTACAACCTCACGGCGGAGGAGCCCATCGTGGGCCT * 5880 

1871 AV S WV S AT Y N LTA E E P I V G L 1890 

... . , - r 

5881 GGTGCTG TGGGC CAGC AGC AAGGTGGTGGCG CCCGGG CAG CTGGTC CATTTTC AGATC C T 5940 

1891 VLW'A S S KVVAP GQL'VH FQ I L 1910 

5 9 41 GCTGGCTGCCGGCTCAGCTGTCACCTTCCGCCTCCAGGTC 6000 

1911 L A A G S A. V T F R L Q V. : G G A N P E V 1930 
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6001 GCTCCCCGGGCCCCGTTTCTCCCACAGCTTCCCCCGCGTCGGAGACCACGTGGTGAGCGT 5060 

1931 LPGPRFSHSFPRVGDHVVSV 1950 

6061 GCGGGGCAAAAACCACGTGAGCTGGGCCCAGGCGCAGGTGCGCATCGTGGTGCTGGAGGC 5120 

1951 RGKNHVSWAQAQVRIVVLSA 1970 

6121 CGTGAGTGGGCTGCAGATGCCCAACTGCTGCGAGCCTGGCATCGCCACGGGCACTGAGAG 6180 

1971 VSGLQMPNCCEPG-IATGTER 1990 

6181 GAACTTCACAGCCCGCGTGC AGCGCGGCTCTCGGGTCGCCTACGCCTGGTACTTCTCGCT 6240 

1991 NFTARVQRG S RVAYAWYF S L 2010 

6241 GCAGAAGGTCCAGGGCGACTCGCTGGTC ATCCTGTCGGGCCGCGACGTC ACCTACACGCC 5300 

2011 Q KVQGD S LV I LSG RDVTY T P 2030 

6301 CGTGGCCGCGGGGCTGTTGGAGATCCAGGTGCGCGCCTTCAACGCCCTGGGCAGTGAGAA 5360 

2031 VAAGLLE IQVRAFNALGS EN 2050 

* 

6361 CCGCACGCTGGTGCTGGAGGTTCAGGACGCCGTCCAGTATGTGGCCCTGCAGAGCGGCCC 6420 

2051 RTLVLEVQ DAVQYVALQSGP 2070 

6421 CTGCTTCACCAACCGCTCGGCGCAGTTTGAGGCCGCCACCAGCCCCAGCCCCCGGCGTGT 6480 

2071 CFTNRSAQFEAATSPS PRRV 2090 
* 

6481 GGCCTACCACTGGGACTTTGGGGATGGGTCGCCAGGGCAGGACACAGATGAGCCCAGGGC 6540 

2091 AYHWD FGDGS PGQDTDEPRA 2110 

6541 CGAGCACTCCTACCTGAGGCCTGGGGACTACCGCGTGCAGGTGAACGCCTCCAACCTGGT 6600 

2111 EHSYLRPGD.YRVQV. NASNLV 2130 

6601 GAGCTTCTTCGTGGCGCAGGCCACGGTGACCGTCCAGGTGCTGGCCTGCCGGGAGCCGGA 6660 

2131 SFFV AQATVTVQVLACRE 'PE 2150 

6661 GGTGGACGTGGTCCTGCCCCTGCAGGTGCTGATGCGGCGATCACAGCGCAACTACTTGGA 6720 

2151 VDVVLPLQVLMRRSQ .RNYLE 2170 

6721 GGCCCACGTTGACCTGCGCGACTGCGTCACCTACCAGACTGAGTACCGCTGGGAGGTGTA 5780 

2171 A. HVDLR DCVTYQTEYRWEVY 2190 

6781 TCGCACCGCCAGCTGCCAGCGGCCGGGGCGCCCAGCGCGTGTGGCCCTGCCCGGCGTGGA 6840 

2191 RTASCQRPGR.PA-RVALPGVD .2210 

6841 CGTGAGCCGGCCTCGGCTGGTGCTGCCGCGGCTGGCGCTGCCTGTGGGGCACTACTGCTT 6900 

2211 VSRPRLVLPRLALPVGHYCF 2230 

6901 TGTGTTTGTCGTGTCATTTGGGGACACGCCACTGACACAGAGCATCCAGGCCAATGTGAC 6960 

2231 VFVVS ? G D T PLTQ S I QANVT 2250 

6961 GGTGGCCCCCGAGCGCCTGGTGCCCATCATTGAGGGTGGCTCATACCGCGTGTGGTCAGA '020 

2251 VAPERLVPI IEGGSYRVW SD 2270 

7021 ' CACACGGGACCTGGTGCTGGATGGGAGCGAGTCCTACGACCCCAACCTGGAGGACGGCGA 7 080 

2271 TRDLVLDGSESYDPNLEDGD 2290 

7081 CCAGACGCCGCTCAGTTTCCACTGGGCCTGTGTGGCTTCGACACAGAGGGAGGCTGGCGG 7140 

2291 QTPLSFHW ACVASTQREAGG 2310 

7141 GTGTGCGCTGAACTTTGGGCCCCGCGGGAGCAGC ACGGTCACCATTCCACGGGAGCGGCT 7200 

2311 CALNFGPRGSSTVTI PRERL 2330 
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